Aggressive Guix garbage collection on the Dam

On a multi-tenant GNU Guix system like the Dam, unprivileged users can trigger the download and installation of software in the store, the read-only part of the filesystem where all the software is.

Left unchecked, this capability will lead to the store eating all the available space.

Here we explain how we leverage GNU Guix's garbage collector to keep the store to a manageable size, fairly shared among users.

1. Illustrating what you need to know about Guix garbage collection

GNU Guix users install software in a profile, either explicitly specified with --profile, or residing in their default profile of ~/.guix-profile 11: which points to /var/guix/profiles/per-user/$USER/guix-profile

Thus, the following command:

guix install --profile=/tmp/test-profile hello

creates a profile in /tmp/test-profile

profile.png

These profiles are merely directories of links to the /store/22: which lives in /gnu/store/ , where the actual software items are stored.

In the case of hello, for example:

$ ls -l /tmp/test-profile/ | grep hello
lrwxrwxrwx 1 root root   60 Jan  1  1970 bin -> \
    /gnu/store/6fbh8phmp3izay6c0dpggpxhcjn4xlm5-hello-2.12.1/bin
store.png

Each item in the store is part of a dependency graph, which links an item (typically a software package) to its dependencies.

For example, the hello package we just installed depends, among others, on glibc to run:

guix graph --type=references hello | grep glibc | head -n 1
"/gnu/store/5mqwac3zshjjn1ig82s12rbi7whqm4n8-hello-2.12.1" -> \
    "/gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35" [color = darkgoldenrod];

And it depends, among others, on make to be built:

guix graph --type=bag hello | grep make | head -n 1
  "/gnu/store/3zh2qpi897s2x229s93iakji86b08a20-hello-2.12.1.drv" -> \
      "/gnu/store/34i09xrz49phnkij2c8k6ps37na6cr74-make-4.3.drv" [color = dimgrey];
store2.png

Profiles act as garbage collector roots, indicating which pieces of software are actively needed at runtime.

Store items to which, in the runtime dependency graph, there is not path from a garbage collector root, are considered dispensable and can be safely removed by the garbage collector without affecting user operations.

After installing hello, the store grew by 185MB ! Running guix gc afterwards removed buildtime dependencies such as make ; the final store size increase after the installation of hello came down to a much more reasonable 844 kB.

store3.png

To reclaim these 844 kB with a mainstream package manager, I would uninstall the hello package. This would not work with guix:

$ guix remove --profile=/tmp/test-profile hello
...
$ guix gc
...

At this point, the store did not shrink by 844 kB, it even became 24kB larger !

Why ? Because hello is still there:

$ ls -l /gnu/store/6fbh8phmp3izay6c0dpggpxhcjn4xlm5-hello-2.12.1
...

Why ? Let's ask Guix:

guix gc --referrers /gnu/store/6fbh8phmp3izay6c0dpggpxhcjn4xlm5-hello-2.12.1
/gnu/store/6419i5z1xg96sxg6m43hyshfji91i863-profile

Wait, there's a profile still referencing hello ? Yes, because old generations are kept in the store to allow one to roll-back:

$ ls -l /tmp/test-profile*
lrwxrwxrwx 1 edouard users 19 May 11 16:43 /tmp/test-profile -> \
    test-profile-2-link
lrwxrwxrwx 1 edouard users 51 May 11 16:20 /tmp/test-profile-1-link -> \
    /gnu/store/6419i5z1xg96sxg6m43hyshfji91i863-profile
lrwxrwxrwx 1 edouard users 51 May 11 16:43 /tmp/test-profile-2-link -> \
    /gnu/store/03nwdiibi8rxha84a4s9fj535i4d2jdi-profile

The big caveat is that guix remembers every change you make to every profile. Uninstalling a program from a profile will not make the corresponding store item garbage collectible. Instead, a reference to it will be kept in a previous generation of the profile. Previous profile generations are as much of a garbage collector root as the latest profile.

store4.png

This design allows one to enjoy atomic updates and unlimited rollbacks. Disk space is cheap, and trading it for peace of mind33: You can fearlessly update your system, any breakage can be reverted by rolling back to the previous generation. is a great compromise.

Also, in contrast with other tools such as pip or virtualenv, all the users share the same store. Installing the same software twice44: whether by two different users, or by the same user in two different profiles will only make the system store the software once. This deduplication saves a lot of space, as most users share a large common software base.

store5.png

Or at least it would save a lot of space, if it wasn't for the biggest cause of store bloat: version drift.

Astute readers will have noticed that in the command line listings, packages and profiles don't have such nice names as hello or glibc, instead guix refers to them as e.g.: 5mqwac3zshjjn1ig82s12rbi7whqm4n8-hello-2.12.1.

The part before the package name is a hash 55: Here is a good explanation. You may want to check the HN comments about it too. . It is a hash of the package source, but also of all the package's dependencies (buildtime and runtime).

This means the the dependency graph we were talking about is not just a simple directed acyclic graph, but a Merkle directed acyclic graph, which pop up basically everywhere in computer science.

As a consequence, if there is the slightest change (even one bit !) in any of the dependency of a package, then the package's full name changes, and we need to store it in another store item.

For example, if our first user installs hello again, but in the meantime an update to make has been made, then the new hello will not be the same package as the old hello, despite its source and the source of all its runtime dependencies being the same.

This my seem overly wasteful, but in the light of the recent supply chain attack on xz 66: See rsc's timeline and writeup, as well as the HN thread. , trust me, this is a VERY GOOD THING. Guix was not affected by this very clever attack because it resists systemd's cancerous growth in the Linux ecosystem, but even if it had, the mitigation would have been easier than on other systems.

The bad thing about this VERY GOOD THING is that it becomes very easy for the version (or more precisely, the name77: The word "version", and formal systems like Semantic Versioning, often imply a teleological view of new being better. Guix's Merkle naming system for packages eschew such a view, the hash is just a secure record of which source code is needed to build and run a piece of software. Some dependencies may go up in version, but some may go down (e.g. to revert a buggy update), and any new combination automatically gets a new name. Of course meaningful version number have their place and are used internally in package definitions, but they are used in the pointer (the URL) to the source code; the hash of the actual content is what really drives the name of the package. ) of hello to drift between two profiles, which causes obsolete versions of hello to linger in the store.

So, let our first user reinstall hello after some change has happened somewhere in the dependency graph (e.g. in make):

store6.png

Now, our first user runs guix gc --delete-generations. This extra flag deletes the old profiles. This avoids keeping old versions around, and you can do so when you want to trade disk space against the ability to roll back to a known working state.

store7.png

Alas, these efforts are twarthed by the existence of the other profile, which, being a garbage collector root, prevents the 5mqwa... version of hello from being collected.

The owner of the other profile must update too:

store8.png

And then only, a call to guix gc --delete-generations will remove the old hello version and give us a garbage collected store, in which both profiles share the same hello.

store9.png

2. Aggressive garbage collection

To reclaim space on the Dam we periodically run a script that:

  • pulls the latest package definitions,
  • remove any cached profile88: those are created e.g. when using guix shell. ,
  • updates all the other profiles,
  • delete all generations,
  • collects the garbage.

By force-updating everybody, we ensure that there is no version drift, and that deduplication work as expected.

3. First caveat: pull, reconfigure, reboot, and then only collect the garbage

root learned the following lesson the hard way: One should:

  • guix pull,
  • guix system reconfigure,
  • reboot,
  • guix gc.

In that order.

Frantically trying to clear some space, root once guix pulled, reconfigured, deleted old system generations, garbage collected, and then rebooted.

It turned out that pulling and reconfiguring inserted in a bug in an activation script. This bug prevented the server from rebooting, but because the old system generations had been deleted and the garbage collected, root could not roll back to a previous known working state from GRUB, as guix usually allows. Lesson learned: only remove old system generations once you are confident the current one is bootable.

Furthermore, rebooting after a reconfigure aligns some "magic profiles", as explained in the doc, which makes garbage collection more effective:

Note: The running system generation—referred to by /run/current-system — is not necessarily the current system generation—referred to by /var/guix/profiles/system: it differs when, for instance, you chose from the bootloader menu to boot an older generation.

It can also differ from the booted system generation—referred to by /run/booted-system —for instance because you reconfigured the system in the meantime.

https://guix.gnu.org/manual/en/guix.html#Invoking-guix-system-1

4. Second caveat: avoid guix pull

4.1. Guix pull is slow and demanding

Guix pull takes several minutes per user, and hogs so much RAM that services such as listen must be stopped before one can guix pull 99: I still have not understood the exact reason why, the issue has been raised multiple times on the mailing list but I did not understand the answers, see e.g. https://lists.gnu.org/archive/html/guix-devel/2024-05/msg00110.html

4.2. Guix pull's cache is gigantic

Running guix pull from a pristine user home takes around 20 min1010: this is of course hardware-dependant and creates a ~700MB $HOME/.cache/guix directory.

Subsequent runs are of course faster (4 minutes instead of 20).

Nevertheless, running guix pull on behalf of each user would generate as many $HOME/.cache/guix directories as there are users, which would pretty quickly negates any deduplication gains in the store.

A solution exists: maintain a guix pull cache in a public directory, from where unprivileged users can then overlay it over their own cache directories. This is brittle and hard to get right, but doable:

mount -t overlay overlay -o \
lowerdir="/root/.cache/guix",\
upperdir="/home/$user/.cache/guix-overlay",\
workdir="/home/$user/.cache/guix-workdir"
 \
"/home/$user/.cache/guix"
bindfs --mirror=$user /home/$user/.cache/guix /home/$user/.cache/guix

But this acrobatic cache management is unneeded thanks to a more powerful solution.

4.3. Solution: don't pull, deploy

When using guix deploy, the very expensive "Computing Guix derivation" step is run, not on the target computer, but on the computer from which guix deploy is run.

This, coupled calling beaverlabs-channels on the operating system, will make it so the system's guix, which lives in /run/current-system/profile/bin/guix, is the latest guix with all the channels up to date.

(define (beaverlabs-channels os)
  "Add beaverlabs channels by default. Without this, it won't remain after a reconfigure"
  (modify-service os guix
                  (channels %beaverlabs-channels)
                  (guix (guix-for-channels %beaverlabs-channels))))

Thanks to this, after a guix deploy, there is no need for anyone to run guix pull, as the system's guix is the latest one !

5. Third caveat: you need store space to free store space

Before you can reclaim the space taken in the store by old versions of the user's software, you must first update said software, which will take up even more space in the store.

Therefore, the store must be aggressively garbage collected before things get dire, otherwise it will not be possible to aggressively garbage collect the store.

Things need get worse before they get better.

6. What we end up with: NukingGC.scm

The solution we end up with is the NukingGC.scm script in our channel. One can run it as root with:

guix repl /path/to/NukingGC.scm

It will

  • delete:
    • old system generations,
    • all cached profiles,
  • and garbage collect with --delete-generations

At this point, to make the store leaner, the script must make it bigger, temporarily. It:

  • updates the default profiles of all users
  • garbage collects with --delete-generations
  • updates all the remaining profiles, for all users
  • garbage collects one last time with --delete-generations.

7. Further work

The script has two blind spots:

the user's guix
as we don't use guix pull, the user's guix may be old versions, and may retain obsolete software in the store.
guix home
it is yet unclear how to manage guix home in a multi-tenant system, this discussion is left open for later.

8. Advertisement

Did you like what you read ?

You can help me write more by:

9. Changelog

  • <2024-06-28 Fri> Merged the first two sections "What you need to understand…" and "Illustrating…" into one, and added the images.
  • <2024-06-24 Mon> Initial version