Adapting Plan 9's listen to GNU Guix

Here is a comprehensive adaptation of Plan 9's elegant network service management design to the Linux environment, focusing on the Guix System distribution. The proposed listen utility initiates network services by executing files named after the protocol and port they serve. This approach offers significant advantages over traditional Linux setups: per-user, per-port, and per-program allocation of ports, to be contrasted with the binary privileged/unprivileged model on Linux; enhanced security through process isolation; and network transparency for service scripts. We also detail the development of auxiliary tools and contributions such as a Go-based 9P2000.L FUSE client needed for container isolation, improvement to the p9ufs 9P2000.L server, and a network-transparent implementation of the finger protocol. We straighforwardly achieve a level of simplicity and security that is currently only achievable on Linux with complex configurations or not at all. The paper concludes with reflections on the challenges and limitations encountered in adapting Plan 9's models to the Linux platform, pointing out the inherent difficulties in reconciling Linux's legacy structures with Plan 9's more streamlined and network-native approach.

1. Introduction

Plan 9 uses a piece of software called listen to start network services.

Network services are defined by the presence, in a directory watched by listen, of executable files whose name is of the form <protocol><port>.

For example, the tcp7 file implements the echo protocol on port 7.

When a client connects on one of the ports, listen starts the corresponding executable file, and

  • forwards incoming data from the connection to the process' stdin,
  • forwards outgoing data from the process' stdout back to the connection.

This elegant design presents three advantages over the current state of affairs on Linux:

  • Instead of a root-owned configuration file, the use of one file per port allows a per-user, per-port, per-program allocation of ports (see section 5).
  • Plan 9 being Plan 9, each process gets its own namespace, that is, its own view of the system resources, isolated from other processes. Such a level of security is available on Linux only through the use of containers (see sections 7.2 and 7.3).
  • Network services written for listen are said to be network-transparent: they need not contain any network code at all. As they read and write on the standard streams, they require less work to write than the same networked service would; they can rely on the plethora of existing command line filters that, too, read from stdin and write to stdout. An example implementation of echo (section 6.1) and finger (section 6.2) are provided, that illustrate this concept.

This document describes our effort in adapting this design to Linux in general and to the Guix System distribution in particular.

The deployment of our version of listen on the Dam 11: our public access UNIX ("PubNix") server, running Guix System with Beaver Labs' guix channel frames the exposition that follows. However, it should be noted that listen is not dam-specific, or even Guix System (the operating system) specific. While it requires GNU Guix (the package manager, see section 3.2) to be installed, it can be22: and has been, on my own workstation running Arch deployed on other Linux distributions than Guix System. Doing so requires careful system administration and painful wrangling with systemd. Guix System (the operating system) avoids this thanks to its declarative configuration primitives (see section 4) and absence of systemd.

The version of listen presented here is in active use on the-dam.org, and the echo (section 6.1), finger (section 6.2), and git (section 6.3) services are reachable with, respectively:

echo hello | nc the-dam.org 7
echo | nc the-dam.org 79
git clone 'git://the-dam.org/listen'

2. Contributions

2.1. listen

The work presented herein yielded more than the listen bash script one can download with:

git clone git@the-dam.org:listen

2.2. f29p

A Golang 9P2000.L FUSE client was developed from scratch:

git clone git@the-dam.org:f29p

This client is needed to mount 9P servers in the containers in which listen isolates the network services (section 7.3). It is a seldom documented fact of Linux that unprivileged mounting is reserved to a select few filesystems, among which 9P is not present. No other piece of software has ever been made public that allows one to mount a 9P2000.L server from within a Linux container.

2.3. p9ufs

A pull request implementing UnlinkAt was merged into p9ufs, a Golang 9P2000.L server: https://github.com/hugelgupf/p9/pull/87/files

2.4. os/listen

listen requires deep changes in the system it is installed on (for example, a change in the default privileged port ranges, see section 5), but we wrote a set of operating configuration functions (see section 4) that makes installing listen a literal one-line change in a configuration file. These functions can be added to Guix System through our channel (Klein 2023).

(channel
  (name 'beaverlabs)
  (url "https://gitlab.com/edouardklein/guix")
  (branch "beaverlabs"))

2.5. fingerd

A network-transparent implementation of finger (see section 6.2) was developed to work with our listen. It should work with Plan 9's listen as well, as soon as somebody writes a Python 3 interpreter for Plan 9, which should be right around the time we sunset IPv4 globally.

Fetch it with:

git clone git@the-dam.org:fingerd

3. Glossary

Before we delve any further into technical explanations, we need to make explicit that there are three sets of words whose meaning is highly ambiguous.

This ambiguity needs to be removed lest the article become very confusing for the reader.

3.1. Namespace

First comes namespace. A namespace in Plan 9 is the view a process has of the filesystem. Because every system resource is available as a file, a namespace is the view a process has of the system's resources (Pike et al. 1993).

On Linux, namespaces are an isolation mechanism bolted upon processes. Linux processes are not, by default, as isolated from one another as they are in Plan 9.

Because on Linux everything is definitely not a file, there are multiple kinds of namespaces. One can be in a network namespace, a mount namespace, a user namespace, etc. (Kerrisk 2013; Unshare(1) – Linux User’s Manual 2023, 2023)

To remove this ambiguity, we will explicitly specify "Linux namespace", "Linux mount namespace", etc. or use the improper term of containers when talking about the specific Linux isolation mechanism, and reserve the bare namespace term for the Plan 9 generic notion of a process' view of the system resources.

When we talk about the namespace of a Linux process, we talk about what it sees of the underlying system. This view is constructed with the help of different, potentially nested Linux namespaces, and 9P mounts through FUSE.

3.2. Guix

GNU Guix is a package manager (Courtès 2013), that can be installed on any Linux distribution.

Guix System is a Linux distribution that uses GNU Shepherd as its daemon manager, lacks systemd, and of course uses Guix as its package manager. It goes further by providing a declarative configuration system for the whole operating system, with atomic updates, roll-backs, etc. (Neidhart 2019)

Waters are muddied by the fact that the guix system command, provided by GNU Guix the package manager (which can be installed on any Linux distribution) allows one to instantiate a Guix System operating system as a VM, a container, a docker image, etc.

We will use Guix to refer to the package manager, and Guix System to the operating system.

3.3. Service

In UNIX parlance, a service is a background process (a daemon), typically launched by a service manager (SysVinit, rc, shepherd, systemd, etc.), examples include network services such as a web server, but also the cron daemon.

In the context of listen, services refer to the network services that listen manages.

On Guix System, a service is a broader notion, that encompasses network services, daemons, but also any aspect of the system configuration, such as udev rules, user accounts, etc.

We will always specify, unless the context makes it absolutely clear, whether we are talking about network/listen services, or Guix services.

We will avoid calling UNIX services services, and use daemon instead.

4. Operating system configuration functions

As mentioned in section 2, listen is not merely a bash script, but also a set of deep modifications to the operating system it runs on:

  • installing listen, f29p and p9ufs;
  • creating nobody-like user and group listen;
  • Making, chowning, and chmoding the following directories, according to some port attribution policy:
    • /srv/listen/, owned by root:root, with perms rwxr-xr-x,
      • in it, the tcpXXX and tcpXXX.namespace scripts, owned by e.g. alice:listen, with perms rw.r-.---;
    • /run/listen/tcpXXX/, owned by e.g. alice:listen, with perms rwxrwx---;
    • /run/9p/, owned by root:root, with perms rwxrwxrwt;
    • /var/log/listen/tcpXXX/, owned by e.g. alice:listen, with perms r-xrwx---.
  • creating the default guix profile in /run/listen/profile;
  • making sure /run/listen is deleted on reboot;
  • setting the available source ports for outgoing tcp connections to 49152-65535 away from its default of 32768-60999;
  • setting the privileged port range to 0-48152 away from its default of 0-1024;
  • setting the cap_net_bind_service +p on the with-cap-bind wrapper, and making it r-x------ for listen:listen.
  • creating, for each user, an HTTPS redirect from e.g. https://alice.example.com to the first port of alice's range;
  • creating services aliases as in section 7.5, e.g. /srv/listen/finger -> /srv/listen/tcp79;
  • starting the listen daemon on boot,
  • but only after all the daemon it needs have started first.

While it is technically possible to apply these modifications to a Linux system using the usual system administration tools (sysctl, adduser, chown, etc.), it would be an ill-advised tall order.

Using our additions to Guix System's declarative configuration mechanism is easier and safer.

Familiarity with GNU Guix, Guix System, or GNU Guile is not expected from the reader, as the examples provided here are quite self-explanatory to anyone with any programming experience. Just understand that what is written f(a, b) in most languages is (f a b) in GNU Guile, and that the last expression of a function is its return value.

The provided primitives are based on an extension of Guix System's configuration mechanism. Guix System relies on a directed acyclic graph: nodes being Guix System services, and an edge from e.g. nginx to account denoting that installing nginx will create a user account on the system. This graph is folded33: in the functional programming sense of fold into a script, collapsing all the extensions into a GNU Guile script that actually changes the system so that it conforms exactly to the declaration (Courtès 2015; “Service Composition – (GNU Guix Reference Manual)” 2024).

While powerful, this mechanism is hard to extend as it requires familiarity with both Guix System and GNU Guile. We abstracted it away thanks to the use of functions that take an operating-system record as an argument, and return a modified operating-system record. These functions can thus be chained, human-centipede style (Six 2010), in a syntax much more familiar to users of imperative languages, not unlike a Dockerfile, while keeping all the power of the Guix System service-graph mechanism.

5. A fine-grained access control API for ports

Without listen, ports on Linux fall under the coarse dichotomy of privileged and unprivileged. Privileged ports are traditionally ports below 1024 44: but a sysctl call to set the net.ipv4.ip_unprivileged_port_start kernel variable will change that. . One used to need to be root to bind to a privileged port, gaining privileged access to the whole operating system as a side effect. This changed when Linux got a new feature called capabilities. The CAP_NET_BIND_SERVICE capability allow one to bind to privileged ports on a per-program basis. Having this capabilities grant no other rights ; but this capability applies to all privileged ports: CAP_NET_BIND_SERVICE accepts no port-based configuration.

listen solves this problem because it equates ports with file names. Using UNIX's file access control API (chown, chmod, etc.), one controls access to ports on a per-port, per-user (and in turn, with the setuid bit, per-program as well) basis.

By keeping an empty, world-writable /srv/listen/ directory, root can let any user bind to any port.

At the other end of the spectrum, root can create all the tcp* files (from tcp1 to tcp65535, one per port), and chown each of them to whomever should control the associated port. The files remain non executable, until the appropriate user wants to activate the server, at which point she runs chmod +x on the file.

On the-dam.org, the last few bits of the hash of the username yields a port number, and we give that user a 12-port range starting at the number derived from the hash. This let us avoid any kind of bookkeeping for port allocation: users come and go, each get an automatically assigned range with a negligible risk of collision. We also offer an https redirection from e.g. https://alice.the-dam.org to the-dam.org's first port in alice's range.

This port allocation is automatically derived from the list of human users any time a user is added or removed, with no need for human intervention.

6. Network service scripts

Let's see how listen works on the-dam.org, from the point of view of the user. The implementation is detailed later, in section 7.

We will study three different network services in increasing order of complexity:

  • the echo service on port 7, which echoes back whatever the clients send,
  • the finger service on port 79, implemented as a simple Python script reading and writing to and from the standard streams,
  • the git protocol on port 9418, implemented with git deamon, which insists on listening on a port instead of using the standard streams.

In contrast with what almost always happen on Linux, where processes gets a full view of the system limited only by its owner's identity, listen launches network service scripts in an extremely limited namespace, as Plan 9's listen does. Our listen must use a container to achieve this while Plan 9's fork (Fork(2) – 9front’s Manual 2024) offers this kind of isolation for free. Network services running in an empty namespace are not very useful. This section shows how users can populate their network service's namespace.

6.1. Echo service

The echo protocol is implemented by linking cat to /srv/listen/tcp7:

ls -l /srv/listen/tcp7
lrwxrwxrwx 1 root root 65 Mar 21 21:14 /srv/listen/tcp7 -> \
    /gnu/store/mppp9hwxizx9g9pikwcvvshb2ffxyq7p-coreutils-9.1/bin/cat

This link is created by Beaver Lab's listen/echo function:

(define (listen/echo os)
  "Return a copy of OS, in which listen's echo (tcp7) service is active."
  (extend-service
     os
     activation
     #~(begin
         (when (file-exists? "/srv/listen/tcp7")
           (delete-file "/srv/listen/tcp7"))
         (symlink #$(file-append coreutils "/bin/cat") "/srv/listen/tcp7"))))

This service can run in the empty namespace that listen provides by default.

listen-echo.svg
Figure 1: The container in which listen runs the echo service. From the point of view of the service script (cat), data comes from the standard input, and the system is almost completely empty.

6.2. The finger service

The above echo example ranks among the simplest things one can do with listen. Let's study now a more complex service, the finger service listening on port 79. Our finger implementation is a simple 68-lines python script that reads a query on its standard input, parses it, sets some of the env vars from the CGI specification55: Allowing queries like finger 'hello?name=alice&greeting=Howdy@the-dam.org' , and execs the requested script in /srv/finger/. This script's output is what the remote finger client gets.

This network service is active now, and one can simply query it with a finger client or by running:

echo | nc the-dam.org 79

to get a list of the available user names.

finger illustrates how to clear two hurdles most real life use-case will meet:

  • it needs to access data from outside the container,
  • it relies on custom software, unavailable from the vanilla version of guix provided by listen's container.

Accessing data from outside listen's container is done, as it is in Plan 9, by mounting 9P servers in the service's namespace.

listen will look for a /srv/finger/tcp79.namespace file. This is not, as it is in Plan 9, a namespace file understood by newns or addns (Auth(2) – 9front’s Manual 2024). It is a proper script, called by listen, and expected to setup its own environment, its own view of the filesystem, and its own network before calling exec $@, relinquishing control to a command built by listen.

The default namespace script, /srv/finger/listen.namespace, called in the absence of a service-specific script, is quite simple:

#!/gnu/store/aslr8ym1df4j80ika5pfxy5kbfv4iz3w-bash-5.1.16/bin/bash
set -euo pipefail
# This is the default namespace file for listen services. This is where you
# mount the 9P services all your listen services need.
exec "$@"  # Once done, call the command provided as an argument by listen

The declarative configuration will not overwrite this file if it already exists, making it easy to configure the default namespace in a site-specific manner.

finger's namespace file contains:

#!/usr/bin/env bash
set -euo pipefail
mkdir -p /srv/finger
f29p unix!/srv/9p/finger /srv/finger &
exec "$@"

This mounts the 9P server listening on /srv/9p/finger to /srv/finger.

This 9P server daemon is started and managed by shepherd, Guix System's daemon manager, thanks to a call to os/9p-serve in the function that configures finger:

(os/9p-serve "/srv/finger/" "/srv/9p/finger" #:name '9p-finger
             #:user "listen" #:owner "listen" #:group "listen" #:mode "700")

This call creates a socket on /srv/9p/finger. On the socket listens an instance of p9ufs, a modern Golang 9P2000.L implementation. This process is owned by listen, and so is the socket.

On Plan 9, namespace operations fail or succeed thanks to the authentication mechanism embedded in 9P. This authentication mechanism relies on the factotum process mounted on /mnt/factotum having the required credentials (Cox et al. 2002).

Despite a previous attempt at porting this mechanism to Linux (Klein and Gette 2023), this mechanism is not readily available to listen. Instead, to control the operations on files outside of the service container, listen relies on the ownership of the p9ufs process and the ownership and permissions of the socket file.

In this particular case, listen gets a read-only by default (one has to pass #:read-write "1" to os/9p-serve to get read-write access) view of /srv/finger, with the same rights as it would have outside of the container. In the next section we will see an example of user git "lending" its read rights on /srv/git to listen (which can't read /srv/git whose owner, group, and mode are git:users r-xrwx---).

This choice makes it possible for e.g. alice to provide finger information to her servermates only, who, belonging to the users group, may call her script while logged-in, whereas anonymous users from the internet, being confined to listen's identity, won't see her script as executable:

ls -l /srv/finger/alice
-rwxr-x--- 1 alice users 121 Mar 18 10:57 /srv/finger/alice

Using custom software is done with a Guix profile. While on Plan 9 one makes software accessible by binding various directories over /bin, the story is more involved on Linux. Due to constraints imposed by ubiquitous dynamic linking, interpreted languages, and mostly-standardized-but-not-quite defaults paths, one has to rely on a myriad of search-paths. search-paths are environment variables that tell the software where to find the resources it needs.

guix can setup profiles (Courtès 2018) that are links to its immutable, content-addressed, store. In it are synthetic directories with links to all the needed resources (them, too, living in the store), as well as a profile script that will set the search-paths to those resources.

listen will load the profile in /run/listen/tcp79/profile before calling the service namespace script and the service script. A default profile is used if /run/listen/tcpXXX/profile does not exists.

In the case of finger, the profile is created by a call to os/profile:

(os/profile "/run/listen/tcp79/profile"
            #:name 'finger-profile
            #:packages '("python" "the-dam-org-f29p"))

which sets up two packages: the f29p 9P FUSE client, and the Python interpreter.

listen-finger.svg
Figure 2: The containers in which listen runs the finger service: the p9ufs 9P server has read-only access to the global /srv/finger directory, and this access is transferred via the /run/9p/finger socket to the tcp79 network service script, which 9P-FUSE-mounts the server in its own namespace. Note that here, as in echo, the network service script has no access to the internet and receives data on its standard input, and sends data on its standard output.

6.3. The git daemon

As revealed by an analysis of the fourth edition source code, 9front's source code, and a question asked on 9fans (“Content of Your /Rc/Bin/Service or /Dis/Svc ? – 9fans Thread” 2024), Plan 9 relies, for the following protocols (as well as a few others), on servers reading and writing on the standard streams: ftp, (ssh), telnet, smtp, http, pop, imap, samba, rlogin, lp, 9P (of course).

Some of those servers, such as rc-httpd have been ported to BSD or Linux, but others rely for example on /net (Presotto and Winterbottom 1993), which does not exist (yet ?) on Linux.

On the-dam.org, git repositories are made available through a user called git, as which anybody can connect through ssh, e.g.:

git clone git@the-dam.org:listen

However, a bug (“Can’t Clone a Git Repo over Anonymous SSH –- Issues.Guix.Gnu.Org” 2023) prevents Guix from using this kind of access, and so we must open port 9418 to let git use its own unauthenticated, unencrypted protocol (but Guix checksums the code after fetching so those two points are not that big of a deal).

Git does provide an easy way to run a server speaking this protocol: just run git daemon. However, git daemon binds to port 9418, and does not use the standard streams.

One could extract the git protocol from the code and create a standalone standard stream network transparent server, but that is a lot of work.

What we do instead is abuse the namespace script. Instead of relinquishing control to a command crafted by listen, here is what tcp9418.namespace does:

#!/usr/bin/env bash
set -euxo pipefail
mkdir -p /srv/git
f29p unix!/srv/9p/git /srv/git &
ip link set lo up
git daemon --base-path=/srv/git --export-all&
exec socat -dd UNIX-listen:/run/socket,fork TCP-CONNECT:127.0.0.1:9418

Lines 3 and 4 play the same role as for finger: making the necessary data available to the service by mounting a 9P server.

Said 9P server is started thanks to this line in the operating system configuration function listen/git-daemon:

(os/9p-serve "/srv/git/" "/srv/9p/git" #:name '9p-git
             #:user "git" #:owner "listen" #:group "listen" #:mode "700")

Here we see that only user listen can use the socket to mount the 9P server, but the process listening on the other end is not owned by listen but by git. This is because listen is not privileged enough to read the actual /srv/git. By making user git run the 9P server, listen gets exactly the kind of access it needs, on a very specific and well-defined subset of the global filesystem.

Line 5 of the git daemon namespace script sets up a loopback network interface. Behind the generic "container" term hide Linux' namespaces (same name as Plan 9, different feature). Important here is a Linux network namespace, that makes a process believe it owns all network interfaces (it does, but not the actual ones). Setting a dummy loopback interface on line 5 lets us start git daemon on line 6, where it will happily listen on port 9418, blissfully unaware that nobody but its parent can see the interface it is listening on.

Instead of calling exec $@ as expected, the namespace script instead execs socat in such a way than any connection to the /run/socket socket is forwarded to the container's port 9418. Creating this socket is the signal listen is waiting for before it starts forwarding incoming data from the internet to the service. Failure to make it appear within a handful of second is a sign that the service failed to start.

Usually listen calls socat inside the container like so:

socat -dd UNIX-listen:/run/socket,fork EXEC:/srv/$service

where /srv/$service is of course the service script, which will communicate with the internet through its standard stream.

By hijacking the expected behaviour and calling socat by itself, tcp9418.namespace can forward data to git daemon's port instead.

As for the tcp9418 script, it is just a link to true, it is never called anyway, it is only used so that by detecting its presence listen will start the service.

To make git, ip, etc. available to the service, /run/listen/tcp9418/profile is created by a call to:

(os/profile "/run/listen/tcp9418/profile"
            #:name 'git-profile
            #:packages '("git" "iproute2" "coreutils" "bash" "socat"
                         "the-dam-org-f29p"))
listen-git.svg
Figure 3: The p9ufs 9P server has read-only access to the global /srv/git directory, and is owned by the git user (user listen wouldn't be able to read /srv/git). This access is transferred, via the /run/9p/git socket and a 9P FUSE mount, to the namespace of the git service script, itself owned by listen. The git service script is itself never started, instead the /srv/tcp9418.namespace sets up a loopack network interface, starts the git daemon, and hijacks the usual procedure by launching socat, instead of relinquishing control to listen. This allows the incoming data to be redirected to the container's port 9418 instead of /srv/tcp9418's standard streams.

6.4. To recap

From the user's point of view, service scripts are fully network transparent. They read data on their standard input and write answers on their standard output. One instance of the script will be launched for each new connection.

Service script may specify software dependencies by setting a Guix profile in /run/listen/tcpXXX/profile, either manually (by calling guix install --profile=/run/listen/tcpXXX/profile ...) or with the declarative operating system configuration function os/profile.

Service scripts are run in a container fully isolated from the rest of the system. To access data from the outside, one mounts a 9P server in the service's companion namespace script. This 9P server can be started manually (calling p9ufs), or through the use of the declarative operating system configuration function os/9p-serve.

Abusing the service namespace script is possible to let non-network transparent servers run without having to rewrite them to use the standard streams. This facility allows typical web-applications to be run with no modifications. the-dam.org redirects https://alice.the-dam.org to alice's first port in her range, and does so for all users, making web application deployment easy, and rootless.

6.5. To go further

Additional configuration abstractions are available, for example the unprivileged counterparts to os/profile and os/9p-serve are home/profile and home/9p-serve. They allow non-root users to run their own 9P servers and create their own profile, so that the service script they own can access them without root having to intervene. Describing them is outside the scope of this document, but they are available to the-dam.org users.

Also, the register-listen-service function is available to inform the shepherd that some daemons (typically, the 9P servers) shall be started before listen.

The bareness of the container in which the service script runs provides two main advantages:

  • It increases security by isolating the service script from the rest of the system. An attacker gaining remote code execution in the container has no obvious way to impact the rest of the system except a denial of service through resource exhaustion (e.g. a fork bomb.).
  • It forces the explicit declaration of all used resources (software dependencies, files) and processes (in our example, the git server). This explicitness makes it trivial to run the exact same script in a slightly different container for development, testing, or failover purposes.

7. How listen works

7.1. Configuration

listen will loop over the files in /srv/listen/ whose name matches tcpXXX where XXX denotes a port number.

When it finds an executable file, for example /srv/listen/tcp7, listen:

  • creates an isolated container in which it:
  • activates the service's profile at /run/listen/tcp7/profile if it exists, or the default one at /run/listen/profile otherwise,
  • runs the service namespace script /srv/listen/tcp7.namespace if it exists, the default namspace script /srv/listen/listen.namespace otherwise,
  • runs socat to link the /run/listen/tcp7/socket socket to the service script /srv/listen/tcp7 standard streams,
  • logs the standard error in /var/log/listen/tcp7/service.log.
  • Then outside the container, it launches a daemon that will listen on port 7, and forward data back and forth to and from the service script (via /run/listen/tcp7/socket).

listen uses inotify to watch the /srv/listen/ dir for changes. When notified of a change, listen scans the whole directory again:

  • listen leaves alone running daemons
    • whose service script remains executable,
    • and whose content has not changed since it started66: To achieve this, listen caches the hash of the script at start time, and compares the current hash with the cached one during the new scan. ,
  • listen restarts running daemons
    • whose service remains executable,
    • but whose content changed since they started,
  • listen stops every running daemon whose service script has lost its execution bit.
  • listen starts every service script that has become executable.

7.2. First security layer: almost unprivileged user listen

The listen process runs owned by nobody-like user listen. This protects user data from unauthorized reads or writes should listen become compromised or misbehaving.

listen-one-container.svg
Figure 4: listen's container prevents damage, even when the service process (here cat), gets compromised.

As a second line of defence, we would have liked listen to run in a container, seeing of the system only what its job requires 77: Note that Plan 9 or Inferno make this trivial :

  • GNU Guix's paths:
    • the read only store /gnu/store/ where GNU Guix installs all software,
    • the /var/guix/ dir to communicate with the guix daemon,
  • the service scripts in /srv/listen/,
  • the log directory in /var/log/listen,
  • a runtime directory in /run/listen/,
  • the host's network.

Alas, Linux mirrors our low-growth,high-capital-returns economy: privilege remains an inherited property(Piketty and Zucman 2015). Despite Linux namespaces and capabilities, this inheritance model prevents unprivileged access to non-file resources like ports from a container [personal communication with the kernel developers]. One example: Linux does not honor setuid or setcap binaries within containers. Solving this issue would require

  • either moving away from the inheritable root-owns-everything model, or its somewhat finer-grained capabilities substitute,
  • or exposing non-file resources like ports as files.

We must for now fall back on basic Linux inter-process isolation.

To prevent any other user from messing with the ports listen manages, we set all ports as privileged.

Despite their complexity and coarse-grainedness, we make use of capabilities: user listen can remain wholly unprivileged but for its ability to bind to all ports. A good step in the right direction from inetd mandatorily running as root !

We use Linux capabilities to grant user listen the right to bind to privileged ports (i.e. all the ports), using a setcap-ed wrapper named with-cap-bind, that only user listen can execute.

7.3. Second security layer: service script containerization

Despite listen having quite a low footprint on the system, it remains a sensitive target:

  • it can access the internet,
  • it can listen on any port,
  • it can kill or mess with existing service script processes.

Yet, the service script processes constitute a more probable target than listen itself. Indeed they do not just pass data around as listen does: they handle this untrusted user data. Pirates use complex operations on untrusted data, e.g. parsing, as a typical attack route.

Because service scripts present a big attack surface, they must run in an environment so isolated that a full takeover would not negatively impact the rest of the system much.

To this effect they run in containers. GNU Guix's guix shell creates the service containers. They only expose:

  • GNU Guix paths,
  • the service-specific (e.g. /run/listen/tcp7) runtime directory,
  • the service-specific (e.g. /var/log/listen/tcp7) log directory,

Note that the script can not access the network ! Communication with the outside happens through the UNIX domain socket /run/listen/tcp7/socket.

7.4. Passing data to the unnetworked daemon

The /srv/listen/tcp* service scripts get their data via the /run/listen/tcp*/socket 88: Note that this path appears as /run/socket to the service script, see figure 4. socket.

The brilliant utility socat acts as plumbing between any two kinds of process input/output (sockets, files, standard input/output/error, etc.).

For e.g. port 7, listen calls it this way:

/run/setuid-programs/with-cap-bind socat -dd \
    TCP4-LISTEN:7,fork,reuseaddr \
    UNIX-connect:/run/7/socket \
    >> /var/log/tcp7/listen.log 2>&1 &

socat runs with the following wrapper, arguments, and options:

with-cap-bind
grants socat the right to bind to privileged ports (see section 7.2).
-dd
enables logging the remote IP addresses. This then feeds tools like fail2ban to react against abuse.
fork
connect to the UNIX domain socket each time a client connects to the tcp socket.
reuseaddr
the socat daemon might die with its TCP socket on port 7 in the TIME_WAIT state. With the socket in this state, no new socat daemon can start again: it would get an "address already in use" error. Going out of the TIME_WAIT state can take up to 4 minutes. To avoid such a delay, socat ignores the TIME_WAIT state thanks to the reuseaddr option.

Interested readers will find an explanation of the TIME_WAIT state in the UNIX socket FAQ.

7.5. Aliases

Port numbers make for a terrible user interface. To use service names instead, one just has to symbolically link the service name to the service script, e.g.

ln -s /srv/listen/finger /srv/listen/tcp79

This allows the owner to edit /srv/listen/finger to change the finger service, instead of having to remember that finger clients connect to port 79.

It also allows authenticated users to start the script by name:

ssh alice@the-dam.org /srv/listen/finger

The script will have an effective user id of alice 99: Unless the script's author has set its setuid bit. , and might therefore give more information than the anonymous version accessible to anyone. By contrast, listen starts the anonymous version with an effective user id of user listen, who can not read much on the system.

8. Alternatives to listen

8.1. Plan 9's listen and Inferno's svc

A study of the source code of the fourth edition of Plan 9, 9front, and inferno, as well as asking for listen service script examples of 9fans (“Content of Your /Rc/Bin/Service or /Dis/Svc ? – 9fans Thread” 2024) make some usage patterns emerge.

Namespace files: while in the default installation few services appear to have a custom tcpXXX.namespace file, some services like ftp and http do. The others use the site-specific default namespace file /lib/namespace. These put the files to be served where the service expect them, without any complex software configuration mechanism.

Seamless authentication: On Plan 9 and inferno, a user's identity spans a whole cluster, not just a single machine. Authentication is baked into the 9P protocol, and requires either less than 15 lines of boilerplate, or direct handling by some standard middleware. Switching a process' identity, say from alice to bob, requires the caller to prove to the auth server that it knows bob's secret. This is automagically handled by the factotum at both ends of the connection. The namespace file benefit implicitly from that mechanism, provided the currently mounted factotum knows the required secrets.

Simple scripts: Only the two simplest services (the echo and discard protocols) in the default installation are fully implemented as shell scripts. The rest are custom servers written usually in C for Plan 9 and Limbo for inferno, or thin shell wrappers around those custom servers. Reports from the 9fans mailing list, however (“Content of Your /Rc/Bin/Service or /Dis/Svc ? – 9fans Thread” 2024), praise the ease with which one can write a custom network service for a specific use case, such as collecting vac hashes after a backup, a text-based zine, a connection helper to some MUDs, etc.

/net usage: The synthetic /net hierarchy (Presotto and Winterbottom 1993) is wildly used among the listen service scripts, which get it as an argument. As with all the resources on Plan 9, this part of the filesystem can be exported and shared, in total or in part, between different processes on different machines.

We want to transpose the patterns to Linux, despite the lack on integration of two of the basic primitives of the Plan 9 experience:

  • per-process view of the filesystem (namespaces),
  • unprivileged modification of said view.

This lack of integration on Linux was compensated by

  • the use of Guix containers (which offer a per-container view of the filesystem, see section 7.3),
  • and our new f29p FUSE 9P client, which allow for unprivileged mounting of 9P2000.L servers from within a container, something that was absolutely impossible before, see section 2.2 1010: While technically true, this sentence disregards the fact that some couples of server/clients will work from within a container, see the aside in section 2.2. .

Namespace scripts replace namespace files, as there are no newns or addns on Linux.

Process and file ownership and permission make a poor, but workable, substitute to the seamless 9P authentication mechanism of Plan 9. As detailed in section 6.3, one can run a 9P server as one identity, and use the socket's ownership and permission to allow another user to benefit from this identity's rights. The alternative, which inetd uses, would be to run as root.

/net is replaced by a single socket from which the service script can talk to the incoming connection. A Linux network namespace isolates the service script from the host's network. Compromises made due to the complex and mostly undocumented interactions between Linux namespaces and Linux capabilities (see section 9.2) make porting /net to Linux a very worthwhile target to improve listen.

8.2. (x)inetd

On Linux and BSD, inetd used to be the tool that would, like listen, launch a network service when a client connects and forward data back and forth between the connection and the service's standard streams.

inetd has fallen out of fashion:

  • most servers have grown in complexity, they do request management on their own.
  • inetd listens on behalf of several servers, these servers start on-demand when a request comes. This saves CPU and memory, but increases latency, and costs one process per request.
    • The CPU and memory gains decreased to insignificance because of decreasing hardware costs.
    • Conversely, from a business perspective, the cost of latency has gone up.
    • Processes cost more than threads (on Linux). Self-managed servers (such as e.g. nginx) use a single process (sometimes one process per core) and handle each request in a thread. This makes inetd more expensive under load than a single more complex server.
    • Nowadays, people use containers (with podman, docker, kubernetes, etc.). Containers typically simulate a whole operating system to run one or a small handful of services. Container orchestration software fulfill inetd's role and more.
    • Like most good things from UNIX, inetd's role now eschews to systemd, the cancer that will eat Linux's userspace. Almost all modern distributions lack an inetd (except for a handful of indomitable Gauls sensible Linux distributions like GNU Guix). BSDs have kept their sanity and their versions of inetd.

inetd reads its configuration from a text file, typically /etc/inetd.conf. Editing this file requires privileged access. This leads to difficulties in a multi-user context. listen addresses this.

inetd also provides each service process with a complete view of the system, limited only by the ownership under which said service process starts. In contrast, listen places security first by running the service scripts in a initially void namespace.

8.3. Authbind

authbind exists since 1998. It allows access to privileged ports on a per-user and per-port basis. It does so by masking an application's call to bind. The application ends up calling authbind's version, which uses a privileged program to call the real bind in the application's stead.

8.4. Systemd

systemd offers to run ephemeral services, as inetd does. However using this functionality means you also have to chose systemd for everything else the systemd developers bullied their way into (including, but not limited to: your login manager, your desktop environment, your DNS, your logs).

9. Further work and position

9.1. Unprivileged 9P mounts

When we begin this adaptation work, we naively believed that Linux capabilities, containers, and the Linux kernel's support of 9P would allow us to emulate Plan 9's per-process namespace on Linux.

The first bad surprise happened when we discovered that within a Linux mount namespace, one can only mount a handful of filesystem types, the only non trivial of which is FUSE. 9P is excluded. We therefore had to write a 9P-to-FUSE wrapper (see section 2.2) instead of relying on the kernel's v9fs.

We would like to work in the direction of a patch that would allow in-container unprivileged mounts of 9P filesystems on Linux.

9.2. Porting /net, to avoid mixing capabilities and Linux namespaces

Another bad surprise happened when we discovered that, first setuid binaries, and worst, setcap binaries, had no effect in Linux user namespaces. After emailing the kernel developers themselves we got confirmation that what we wanted to achieve: use a setcap binary to grant listen the ability to bind to privileged ports from within a Linux user namespace, was impossible.

This makes it impossible to run listen in its own namespace, as we initially wanted to do (see section 7.2), instead it gets a full view of the system, limited only by its identity.

Porting /net to Linux, would allow us to run listen in a container in which the host's /net is 9P-imported, and stop caring the Linux capabilities feature of which was mildly said that "coherence in its design and implementation are not particularly evident"1111: https://lwn.net/Articles/632520 .

9.3. Pledge

After running their initialization code, and just before exec-ing into listen's provided command, namespace scripts should freeze their namespaces, and prevent the service script from doing anything not strictly necessary to answer the clients' requests. One way to do that is to port OpenBSD's pledge to bash. Justine has written an awesome port of pledge() to Linux. We wish to incorporate it in a loadable bash builtin, which would allow pledges to happen mid-process in bash, and not just as a wrapper to a command.

9.4. Tighten the capable wrapper

listen's ability to bind on privileged ports hinges on the with-cap-bind wrapper. As of <2024-01-08 Mon> this wrapper supports any command. Its only use consists of a socat invocation that redirects data from a port to a service container's /run/socket:

/run/setuid-programs/with-cap-bind "$(which socat)" -dd \
    TCP4-LISTEN:"$port",fork,reuseaddr \
    UNIX-connect:/run/listen/"$service"/socket \
    >>/var/log/listen/"$service"/listen.log 2>&1 &

We should make the port number the only argument to the wrapper, and bake the socat invocation into the wrapper. That way, when an attacker gains remote code execution as user listen, it can not bind arbitrary services to any port unless it also gains write rights to /srv/listen (which user listen does not have).

9.5. Listen services

Our work on listen started when looking for a clean way to write a finger server and getting frustrated that none existed. Our finger service being operational, our sights are now set on new protocols aiming for simplicity like gemini and nostr.

9.6. Preventing resource exhaustion

As mentioned in section 6.5, an attacker taking over a service process could exhaust the system's resources. We are working on a fix to that, but Linux' API for resource limits has complex interactions with Linux namespaces, that we do not fully understand yet.

9.7. Down with port numbers

Securing the listen process proved the most frustrating part of the design work.

For example: the question "how do unprivileged users bind to privileged ports ?" admits no good answer, because the question itself is wrong on so many levels:

  • Port number themselves stem from TCP emerging from earlier protocols (see the early RFCs 322, 349, 433 and those that obsolete them), and a clean design would probably elect to eschew them, leveraging a \(2^{128}\) address space to allow process-to-process communication, instead of the route-to-host, then route-to-process dance we do know.
    • The host to process frontier should be an implementation detail on the receiving end, not baked so deeply in the stack.
    • This barrier may even change from request to request as new hosts come up or down depending on load.
    • This already happens anyway with e.g. kubernetes, but we would have less cruft if it was baked into the protocol.
  • We should use strings instead of numbers to specify the protocol.
  • Apart from ease of implementation and historically underpowered hardware, why does one need to specify privileged ports as a single range ? One should be able to restrict access on a per-port basis.
  • Even then, why should one manage ports as a monolithic entity that either all users, or no user but root, can access ?
  • Ports should benefit from a fine-grained access control API, like files do. Exposing ports as virtual files would allow that.

10. Conclusion

The work herein proposed is but a leaky dam, failing to contain the waterweight of historical legacy.

11. Changelog

  • <2024-04-12 Fri> Recognize that other 9P client/server couples work from within a container.

12. Bibliography

Auth(2) – 9front’s Manual. 2024. http://man.9front.org/2/auth.
Courtès, Ludovic. 2013. “Functional Package Management with Guix.” Arxiv Preprint Arxiv:1305.4584.
———. 2015. “Service Composition in GuixSD.” https://guix.gnu.org/en/blog/2015/service-composition-in-guixsd/.
———. 2018. “Multi-Dimensional Transactions and Rollbacks, Oh My!” https://guix.gnu.org/en/blog/2018/multi-dimensional-transactions-and-rollbacks-oh-my/.
Cox, Russ, Eric Grosse, Rob Pike, Dave Presotto, and Sean Quinlan. 2002. “Security in Plan 9.” In 11th Usenix Security Symposium (Usenix Security 02).
Fork(2) – 9front’s Manual. 2024. http://man.9front.org/2/fork.
Kerrisk, Michael. 2013. “Namespaces in Operation, Part 1: Namespaces Overview [LWN.Net] –- Lwn.Net.” https://lwn.net/Articles/531114/.
Klein, Edouard. 2023. “Tutorial: Add Beaver Labs’ Channel to Guix –- the-Dam.Org.” https://the-dam.org/docs/tutorials/BeaverLabsChannel.html.
Klein, Edouard, and Guillaume Gette. 2023. “Dr Glendarme or: How I Learned to Stop Kerberos and Love Factotum.” Iwp9 2023.
Namespaces(7) – Linux User’s Manual. 2023. https://www.man7.org/linux/man-pages/man7/namespaces.7.html.
Neidhart, Pierre. 2019. “Guix: A Most Advanced Operating System.” https://web.archive.org/web/20210528220842/https://ambrevar.xyz/guix-advance/index.html.
Pike, Rob, Dave Presotto, Ken Thompson, Howard Trickey, and Phil Winterbottom. 1993. “The Use of Name Spaces in Plan 9.” Acm Sigops Operating Systems Review 27 (2): 72–76.
Piketty, Thomas, and Gabriel Zucman. 2015. “Wealth and Inheritance in the Long Run.” In Handbook of Income Distribution, 2:1303–68. Elsevier.
Presotto, David L, and Phil Winterbottom. 1993. “The Organization of Networks in Plan 9.” In Usenix Winter, 271–80.
Six, Tom. 2010. “The Human Centipede (First Sequence).” https://www.imdb.com/title/tt1467304/.
Unshare(1) – Linux User’s Manual. 2023. https://man7.org/linux/man-pages/man1/unshare.1.html.
“Can’t Clone a Git Repo over Anonymous SSH –- Issues.Guix.Gnu.Org.” 2023. https://issues.guix.gnu.org/64648.
“Content of Your /Rc/Bin/Service or /Dis/Svc ? – 9fans Thread.” 2024. https://marc.info/?t=170937608300001&r=1&w=2.
“Service Composition – (GNU Guix Reference Manual).” 2024. https://guix.gnu.org/manual/en/html_node/Service-Composition.html.