Aggressive Guix garbage collection on the Dam
On a multi-tenant GNU Guix system like the Dam, unprivileged users can trigger the download and installation of software in the store, the read-only part of the filesystem where all the software is.
Left unchecked, this capability will lead to the store eating all the available space.
Here we explain how we leverage GNU Guix's garbage collector to keep the store to a manageable size, fairly shared among users.
1. Illustrating what you need to know about Guix garbage collection
GNU Guix users install software in a profile,
either explicitly specified with --profile
,
or residing in their default profile of
~/.guix-profile
11: which points to /var/guix/profiles/per-user/$USER/guix-profile
Thus, the following command:
guix install --profile=/tmp/test-profile hello
creates a profile in /tmp/test-profile
These profiles are merely directories of links
to the /store/22: which lives in /gnu/store/
,
where the actual software items are stored.
In the case of hello, for example:
$ ls -l /tmp/test-profile/ | grep hello
lrwxrwxrwx 1 root root 60 Jan 1 1970 bin -> \
/gnu/store/6fbh8phmp3izay6c0dpggpxhcjn4xlm5-hello-2.12.1/bin
Each item in the store is part of a dependency graph, which links an item (typically a software package) to its dependencies.
For example, the hello
package we just installed depends, among others, on
glibc to run:
guix graph --type=references hello | grep glibc | head -n 1 "/gnu/store/5mqwac3zshjjn1ig82s12rbi7whqm4n8-hello-2.12.1" -> \ "/gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35" [color = darkgoldenrod];
And it depends, among others, on make
to be built:
guix graph --type=bag hello | grep make | head -n 1 "/gnu/store/3zh2qpi897s2x229s93iakji86b08a20-hello-2.12.1.drv" -> \ "/gnu/store/34i09xrz49phnkij2c8k6ps37na6cr74-make-4.3.drv" [color = dimgrey];
Profiles act as garbage collector roots, indicating which pieces of software are actively needed at runtime.
Store items to which, in the runtime dependency graph, there is not path from a garbage collector root, are considered dispensable and can be safely removed by the garbage collector without affecting user operations.
After installing hello
,
the store grew by 185MB !
Running guix gc
afterwards removed buildtime dependencies such as make
;
the final store size increase after the installation of hello
came down
to a much more reasonable 844 kB
.
To reclaim these 844 kB with a mainstream package manager,
I would uninstall the hello
package.
This would not work with guix:
$ guix remove --profile=/tmp/test-profile hello ... $ guix gc ...
At this point, the store did not shrink by 844 kB, it even became 24kB larger !
Why ? Because hello
is still there:
$ ls -l /gnu/store/6fbh8phmp3izay6c0dpggpxhcjn4xlm5-hello-2.12.1 ...
Why ? Let's ask Guix:
guix gc --referrers /gnu/store/6fbh8phmp3izay6c0dpggpxhcjn4xlm5-hello-2.12.1 /gnu/store/6419i5z1xg96sxg6m43hyshfji91i863-profile
Wait, there's a profile still referencing hello
? Yes, because old generations
are kept in the store to allow one to roll-back:
$ ls -l /tmp/test-profile* lrwxrwxrwx 1 edouard users 19 May 11 16:43 /tmp/test-profile -> \ test-profile-2-link lrwxrwxrwx 1 edouard users 51 May 11 16:20 /tmp/test-profile-1-link -> \ /gnu/store/6419i5z1xg96sxg6m43hyshfji91i863-profile lrwxrwxrwx 1 edouard users 51 May 11 16:43 /tmp/test-profile-2-link -> \ /gnu/store/03nwdiibi8rxha84a4s9fj535i4d2jdi-profile
The big caveat is that guix remembers every change you make to every profile. Uninstalling a program from a profile will not make the corresponding store item garbage collectible. Instead, a reference to it will be kept in a previous generation of the profile. Previous profile generations are as much of a garbage collector root as the latest profile.
This design allows one to enjoy atomic updates and unlimited rollbacks. Disk space is cheap, and trading it for peace of mind33: You can fearlessly update your system, any breakage can be reverted by rolling back to the previous generation. is a great compromise.
Also, in contrast with other tools such as pip
or virtualenv
,
all the users share the same store. Installing the same software twice44: whether by two different users, or by the same user in two different profiles
will only make the system store the software once.
This deduplication saves a lot of space,
as most users share a large common software base.
Or at least it would save a lot of space, if it wasn't for the biggest cause of store bloat: version drift.
Astute readers will have noticed that in the command line listings,
packages and profiles don't have such nice names as hello
or glibc
,
instead guix refers to them as e.g.:
5mqwac3zshjjn1ig82s12rbi7whqm4n8-hello-2.12.1
.
The part before the package name is a hash 55: Here is a good explanation. You may want to check the HN comments about it too. . It is a hash of the package source, but also of all the package's dependencies (buildtime and runtime).
This means the the dependency graph we were talking about is not just a simple directed acyclic graph, but a Merkle directed acyclic graph, which pop up basically everywhere in computer science.
As a consequence, if there is the slightest change (even one bit !) in any of the dependency of a package, then the package's full name changes, and we need to store it in another store item.
For example, if our first user installs hello
again, but in the meantime an update to make
has been made, then the new hello
will not be the same package as the old hello
, despite its source and the source of all its runtime dependencies being the same.
This my seem overly wasteful,
but in the light of the recent supply chain attack on xz
66: See rsc's timeline and writeup, as well as the HN thread. ,
trust me, this is a VERY GOOD THING.
Guix was not affected by this very clever attack
because it resists systemd
's cancerous growth in the Linux ecosystem,
but even if it had, the mitigation would have been easier than on other systems.
The bad thing about this VERY GOOD THING is that
it becomes very easy for the version
(or more precisely, the name77: The word "version", and formal systems like Semantic Versioning, often imply a teleological view of new being better. Guix's Merkle naming system for packages eschew such a view, the hash is just a secure record of which source code is needed to build and run a piece of software. Some dependencies may go up in version, but some may go down (e.g. to revert a buggy update), and any new combination automatically gets a new name. Of course meaningful version number have their place and are used internally in package definitions, but they are used in the pointer (the URL) to the source code; the hash of the actual content is what really drives the name of the package. )
of hello
to drift between two profiles,
which causes obsolete versions of hello
to linger in the store.
So, let our first user reinstall hello
after some change has happened somewhere in the dependency graph (e.g. in make
):
Now, our first user runs guix gc --delete-generations
. This extra flag deletes the old profiles. This avoids keeping old versions around,
and you can do so when you want to trade disk space
against the ability to roll back to a known working state.
Alas, these efforts are twarthed by the existence of the other profile, which,
being a garbage collector root,
prevents the 5mqwa...
version of hello
from being collected.
The owner of the other profile must update too:
And then only, a call to guix gc --delete-generations
will remove the old hello
version
and give us a garbage collected store, in which both profiles share the same hello
.
2. Aggressive garbage collection
To reclaim space on the Dam we periodically run a script that:
- pulls the latest package definitions,
- remove any cached profile88: those are created e.g. when using
guix shell
. , - updates all the other profiles,
- delete all generations,
- collects the garbage.
By force-updating everybody, we ensure that there is no version drift, and that deduplication work as expected.
3. First caveat: pull, reconfigure, reboot, and then only collect the garbage
root
learned the following lesson the hard way:
One should:
guix pull
,guix system reconfigure
,- reboot,
guix gc
.
In that order.
Frantically trying to clear some space, root
once guix pulled, reconfigured,
deleted old system generations, garbage collected, and then rebooted.
It turned out that pulling and reconfiguring inserted in a bug in an
activation script. This bug prevented the server from rebooting,
but because the old system generations had been deleted and the garbage collected,
root
could not roll back to a previous known working state from GRUB, as guix usually allows.
Lesson learned: only remove old system generations
once you are confident the current one is bootable.
Furthermore, rebooting after a reconfigure aligns some "magic profiles", as explained in the doc, which makes garbage collection more effective:
Note: The running system generation—referred to by
/run/current-system
— is not necessarily the current system generation—referred to by/var/guix/profiles/system
: it differs when, for instance, you chose from the bootloader menu to boot an older generation.It can also differ from the booted system generation—referred to by
/run/booted-system
—for instance because you reconfigured the system in the meantime.– https://guix.gnu.org/manual/en/guix.html#Invoking-guix-system-1
4. Second caveat: avoid guix pull
4.1. Guix pull is slow and demanding
Guix pull takes several minutes per user,
and hogs so much RAM that services such as listen must be stopped before one can guix pull
99: I still have not understood the exact reason why, the issue has been raised multiple times on the mailing list but I did not understand the answers, see e.g. https://lists.gnu.org/archive/html/guix-devel/2024-05/msg00110.html
4.2. Guix pull's cache is gigantic
Running guix pull
from a pristine user home takes around 20 min1010: this is of course hardware-dependant and creates a ~700MB $HOME/.cache/guix
directory.
Subsequent runs are of course faster (4 minutes instead of 20).
Nevertheless, running guix pull
on behalf of each user would generate as many
$HOME/.cache/guix
directories as there are users, which would pretty quickly
negates any deduplication gains in the store.
A solution exists: maintain a guix pull cache in a public directory, from where unprivileged users can then overlay it over their own cache directories. This is brittle and hard to get right, but doable:
mount -t overlay overlay -o \ lowerdir="/root/.cache/guix",\ upperdir="/home/$user/.cache/guix-overlay",\ workdir="/home/$user/.cache/guix-workdir" \ "/home/$user/.cache/guix" bindfs --mirror=$user /home/$user/.cache/guix /home/$user/.cache/guix
But this acrobatic cache management is unneeded thanks to a more powerful solution.
4.3. Solution: don't pull, deploy
When using guix deploy
,
the very expensive "Computing Guix derivation" step is run, not on the target computer, but on the computer from which guix deploy
is run.
This, coupled calling beaverlabs-channels
on the operating system,
will make it so the system's guix, which lives in /run/current-system/profile/bin/guix
,
is the latest guix with all the channels up to date.
(define (beaverlabs-channels os) "Add beaverlabs channels by default. Without this, it won't remain after a reconfigure" (modify-service os guix (channels %beaverlabs-channels) (guix (guix-for-channels %beaverlabs-channels))))
Thanks to this, after a guix deploy
,
there is no need for anyone to run guix pull
,
as the system's guix is the latest one !
5. Third caveat: you need store space to free store space
Before you can reclaim the space taken in the store by old versions of the user's software, you must first update said software, which will take up even more space in the store.
Therefore, the store must be aggressively garbage collected before things get dire, otherwise it will not be possible to aggressively garbage collect the store.
Things need get worse before they get better.
6. What we end up with: NukingGC.scm
The solution we end up with is the NukingGC.scm script in our channel. One can run it as root with:
guix repl /path/to/NukingGC.scm
It will
- delete:
- old system generations,
- all cached profiles,
- and garbage collect with
--delete-generations
At this point, to make the store leaner, the script must make it bigger, temporarily. It:
- updates the default profiles of all users
- garbage collects with
--delete-generations
- updates all the remaining profiles, for all users
- garbage collects one last time with
--delete-generations
.
7. Further work
The script has two blind spots:
- the user's guix
- as we don't use
guix pull
, the user's guix may be old versions, and may retain obsolete software in the store. - guix home
- it is yet unclear how to manage guix home in a multi-tenant system, this discussion is left open for later.
8. Advertisement
Did you like what you read ?
You can help me write more by:
- renting a guix VPS from me,
- hiring me for a consulting gig: software development, cybersecurity audit and training, cryptocurrency forensics, etc. see my personal page,
- letting me teach you Python, or spreading the word about this course,
- or buying a very, very secure laptop from me.
9. Changelog
- Merged the first two sections "What you need to understand…" and "Illustrating…" into one, and added the images.
- Initial version