Adapting Plan 9's listen to GNU Guix
Here is a comprehensive adaptation of Plan 9's elegant network service
management design to the Linux environment, focusing on the Guix System
distribution. The proposed listen
utility initiates network services by
executing files named after the protocol and port they serve. This approach
offers significant advantages over traditional Linux setups: per-user, per-port,
and per-program allocation of ports, to be contrasted with the binary
privileged/unprivileged model on Linux; enhanced security through process
isolation; and network transparency for service scripts. We also detail the
development of auxiliary tools and contributions such as a Go-based 9P2000.L
FUSE client needed for container isolation, improvement to the p9ufs
9P2000.L server, and a network-transparent implementation of the finger
protocol. We straighforwardly achieve a level of simplicity and security that is
currently only achievable on Linux with complex configurations or not at all.
The paper concludes with reflections on the challenges and limitations
encountered in adapting Plan 9's models to the Linux platform, pointing out the
inherent difficulties in reconciling Linux's legacy structures with Plan 9's
more streamlined and network-native approach.
1. Introduction
Plan 9 uses a piece of software called listen
to start network services.
Network services are defined by the presence, in a directory watched by
listen
, of executable files whose name is of the form <protocol><port>
.
For example, the tcp7
file implements the echo
protocol on port 7.
When a client connects on one of the ports, listen
starts the corresponding
executable file, and
- forwards incoming data from the connection to the process'
stdin
, - forwards outgoing data from the process'
stdout
back to the connection.
This elegant design presents three advantages over the current state of affairs on Linux:
- Instead of a root-owned configuration file, the use of one file per port allows a per-user, per-port, per-program allocation of ports (see section 5).
- Plan 9 being Plan 9, each process gets its own namespace, that is, its own view of the system resources, isolated from other processes. Such a level of security is available on Linux only through the use of containers (see sections 7.2 and 7.3).
- Network services written for
listen
are said to be network-transparent: they need not contain any network code at all. As they read and write on the standard streams, they require less work to write than the same networked service would; they can rely on the plethora of existing command line filters that, too, read fromstdin
and write tostdout
. An example implementation of echo (section 6.1) and finger (section 6.2) are provided, that illustrate this concept.
This document describes our effort in adapting this design to Linux in general and to the Guix System distribution in particular.
The deployment of our version of listen
on the Dam 11: our public access UNIX ("PubNix") server, running Guix System with Beaver Labs' guix channel frames the exposition that follows. However, it should be noted that
listen
is not dam-specific, or even Guix System (the operating system)
specific. While it requires GNU Guix (the package manager, see section 3.2)
to be installed, it can be22: and has been, on my own workstation running Arch
deployed on other Linux distributions than Guix System. Doing so requires
careful system administration and painful wrangling with systemd
. Guix System
(the operating system) avoids this thanks to its declarative configuration
primitives (see section 4) and absence of systemd
.
The version of listen
presented here is in active use on the-dam.org
, and
the echo (section 6.1), finger (section 6.2), and git (section 6.3) services
are reachable with, respectively:
echo hello | nc the-dam.org 7 echo | nc the-dam.org 79 git clone 'git://the-dam.org/listen'
2. Contributions
2.1. listen
The work presented herein yielded more than the listen
bash script one can
download with:
git clone git@the-dam.org:listen
2.2. f29p
A Golang 9P2000.L FUSE client was developed from scratch:
git clone git@the-dam.org:f29p
This client is needed to mount 9P servers in the containers in which listen
isolates the network services (section 7.3). It is a seldom documented
fact of Linux that unprivileged mounting is reserved to a select few
filesystems, among which 9P is not present. No other piece of software has ever
been made public that allows one to mount a 9P2000.L server from within a Linux
container.
2.3. p9ufs
A pull request implementing UnlinkAt was merged into p9ufs, a Golang 9P2000.L server: https://github.com/hugelgupf/p9/pull/87/files
2.4. os/listen
listen
requires deep changes in the system it is installed on (for example, a
change in the default privileged port ranges, see section 5), but we
wrote a set of operating configuration functions (see section
4) that makes installing listen
a literal one-line change in
a configuration file. These functions can be added to Guix System through our channel (Klein 2023).
(channel (name 'beaverlabs) (url "https://gitlab.com/edouardklein/guix") (branch "beaverlabs"))
2.5. fingerd
A network-transparent implementation of finger (see section 6.2) was
developed to work with our listen
. It should work with Plan 9's listen
as well,
as soon as somebody writes a Python 3 interpreter for Plan 9, which should be right
around the time we sunset IPv4 globally.
Fetch it with:
git clone git@the-dam.org:fingerd
3. Glossary
Before we delve any further into technical explanations, we need to make explicit that there are three sets of words whose meaning is highly ambiguous.
This ambiguity needs to be removed lest the article become very confusing for the reader.
3.1. Namespace
First comes namespace. A namespace in Plan 9 is the view a process has of the filesystem. Because every system resource is available as a file, a namespace is the view a process has of the system's resources (Pike et al. 1993).
On Linux, namespaces are an isolation mechanism bolted upon processes. Linux processes are not, by default, as isolated from one another as they are in Plan 9.
Because on Linux everything is definitely not a file, there are multiple kinds of namespaces. One can be in a network namespace, a mount namespace, a user namespace, etc. (Kerrisk 2013; Unshare(1) – Linux User’s Manual 2023, 2023)
To remove this ambiguity, we will explicitly specify "Linux namespace", "Linux mount namespace", etc. or use the improper term of containers when talking about the specific Linux isolation mechanism, and reserve the bare namespace term for the Plan 9 generic notion of a process' view of the system resources.
When we talk about the namespace of a Linux process, we talk about what it sees of the underlying system. This view is constructed with the help of different, potentially nested Linux namespaces, and 9P mounts through FUSE.
3.2. Guix
GNU Guix is a package manager (Courtès 2013), that can be installed on any Linux distribution.
Guix System is a Linux distribution that uses GNU Shepherd as its daemon manager, lacks systemd, and of course uses Guix as its package manager. It goes further by providing a declarative configuration system for the whole operating system, with atomic updates, roll-backs, etc. (Neidhart 2019)
Waters are muddied by the fact that the guix system
command, provided by GNU
Guix the package manager (which can be installed on any Linux distribution)
allows one to instantiate a Guix System operating system as a VM, a container, a
docker image, etc.
We will use Guix to refer to the package manager, and Guix System to the operating system.
3.3. Service
In UNIX parlance, a service is a background process (a daemon), typically launched by a service manager (SysVinit, rc, shepherd, systemd, etc.), examples include network services such as a web server, but also the cron daemon.
In the context of listen
, services refer to the network services that
listen
manages.
On Guix System, a service is a broader notion, that encompasses network services, daemons, but also any aspect of the system configuration, such as udev rules, user accounts, etc.
We will always specify, unless the context makes it absolutely clear, whether we are talking about network/listen services, or Guix services.
We will avoid calling UNIX services services, and use daemon instead.
4. Operating system configuration functions
As mentioned in section 2, listen
is not merely a bash script, but
also a set of deep modifications to the operating system it runs on:
- installing
listen
,f29p
andp9ufs
; - creating nobody-like user and group
listen
; - Making, chowning, and chmoding the following directories, according to some port attribution policy:
/srv/listen/
, owned byroot:root
, with permsrwxr-xr-x
,- in it, the
tcpXXX
andtcpXXX.namespace
scripts, owned by e.g.alice:listen
, with permsrw.r-.---
;
- in it, the
/run/listen/tcpXXX/
, owned by e.g.alice:listen
, with permsrwxrwx---
;/run/9p/
, owned byroot:root
, with permsrwxrwxrwt
;/var/log/listen/tcpXXX/
, owned by e.g.alice:listen
, with permsr-xrwx---
.
- creating the default guix profile in
/run/listen/profile
; - making sure
/run/listen
is deleted on reboot; - setting the available source ports for outgoing tcp connections to 49152-65535 away from its default of 32768-60999;
- setting the privileged port range to 0-48152 away from its default of 0-1024;
- setting the
cap_net_bind_service
+p
on thewith-cap-bind
wrapper, and making itr-x------
forlisten:listen
. - creating, for each user, an HTTPS redirect from e.g.
https://alice.example.com
to the first port ofalice
's range; - creating services aliases as in section 7.5, e.g.
/srv/listen/finger -> /srv/listen/tcp79
; - starting the
listen
daemon on boot, - but only after all the daemon it needs have started first.
While it is technically possible to apply these modifications to a Linux system
using the usual system administration tools (sysctl
, adduser
, chown
,
etc.), it would be an ill-advised tall order.
Using our additions to Guix System's declarative configuration mechanism is easier and safer.
Familiarity with GNU Guix, Guix System, or GNU Guile is not expected from the
reader, as the examples provided here are quite self-explanatory to anyone with
any programming experience. Just understand that what is written f(a, b)
in
most languages is (f a b)
in GNU Guile, and that the last expression of a
function is its return value.
The provided primitives are based on an extension of Guix System's configuration
mechanism. Guix System relies on a directed acyclic graph: nodes being Guix
System services, and an edge from e.g. nginx
to account
denoting that
installing nginx
will create a user account on the system. This graph is
folded33: in the functional programming sense of fold
into a script,
collapsing all the extensions into a GNU Guile script that actually changes the
system so that it conforms exactly to the declaration (Courtès 2015; “Service Composition – (GNU Guix Reference Manual)” 2024).
While powerful, this mechanism is hard to extend as it requires familiarity with both Guix System and GNU Guile. We abstracted it away thanks to the use of functions that take an operating-system record as an argument, and return a modified operating-system record. These functions can thus be chained, human-centipede style (Six 2010), in a syntax much more familiar to users of imperative languages, not unlike a Dockerfile, while keeping all the power of the Guix System service-graph mechanism.
5. A fine-grained access control API for ports
Without listen
,
ports on Linux fall under the coarse dichotomy of privileged and unprivileged.
Privileged ports are traditionally ports below 1024
44: but a sysctl
call to set the net.ipv4.ip_unprivileged_port_start
kernel variable will change that. .
One used to need to be root
to bind to a privileged port,
gaining privileged access to the whole operating system as a side effect.
This changed when Linux got a new feature called capabilities.
The CAP_NET_BIND_SERVICE
capability allow one
to bind to privileged ports on a per-program basis.
Having this capabilities grant no other rights ;
but this capability applies to all privileged ports:
CAP_NET_BIND_SERVICE
accepts no port-based configuration.
listen
solves this problem because it equates ports with file names.
Using UNIX's file access control API (chown
, chmod
, etc.),
one controls access to ports on a per-port, per-user
(and in turn, with the setuid
bit, per-program as well) basis.
By keeping an empty, world-writable /srv/listen/
directory,
root
can let any user bind to any port.
At the other end of the spectrum, root
can create all the tcp*
files
(from tcp1
to tcp65535
, one per port),
and chown
each of them to
whomever should control the associated port.
The files remain non executable,
until the appropriate user wants to activate the server,
at which point she runs chmod +x
on the file.
On the-dam.org
,
the last few bits of the hash of the username yields a port number,
and we give that user a 12-port range starting at the number derived from the hash.
This let us avoid any kind of bookkeeping for port allocation:
users come and go,
each get an automatically assigned range
with a negligible risk of collision.
We also offer an https redirection
from e.g. https://alice.the-dam.org to the-dam.org
's first port in alice
's range.
This port allocation is automatically derived from the list of human users any time a user is added or removed, with no need for human intervention.
6. Network service scripts
Let's see how listen
works on the-dam.org
,
from the point of view of the user.
The implementation is detailed later, in section 7.
We will study three different network services in increasing order of complexity:
- the
echo
service on port 7, which echoes back whatever the clients send, - the
finger
service on port 79, implemented as a simple Python script reading and writing to and from the standard streams, - the git protocol on port 9418, implemented with
git deamon
, which insists on listening on a port instead of using the standard streams.
In contrast with what almost always happen on Linux, where processes gets a full
view of the system limited only by its owner's identity, listen
launches
network service scripts in an extremely limited namespace, as Plan 9's listen
does. Our listen
must use a container to achieve this while Plan 9's fork
(Fork(2) – 9front’s Manual 2024) offers this kind of isolation for free. Network services running in
an empty namespace are not very useful. This section shows how users can
populate their network service's namespace.
6.1. Echo service
The echo
protocol is implemented by linking cat
to /srv/listen/tcp7
:
ls -l /srv/listen/tcp7
lrwxrwxrwx 1 root root 65 Mar 21 21:14 /srv/listen/tcp7 -> \
/gnu/store/mppp9hwxizx9g9pikwcvvshb2ffxyq7p-coreutils-9.1/bin/cat
This link is created by Beaver Lab's listen/echo
function:
(define (listen/echo os) "Return a copy of OS, in which listen's echo (tcp7) service is active." (extend-service os activation #~(begin (when (file-exists? "/srv/listen/tcp7") (delete-file "/srv/listen/tcp7")) (symlink #$(file-append coreutils "/bin/cat") "/srv/listen/tcp7"))))
This service can run in the empty namespace that listen
provides by default.
6.2. The finger service
The above echo
example ranks among the simplest things one can do with
listen
. Let's study now a more complex service, the finger
service listening
on port 79. Our finger implementation is a simple 68-lines python script that
reads a query on its standard input, parses it, sets some of the env vars from
the CGI specification55: Allowing queries like finger 'hello?name=alice&greeting=Howdy@the-dam.org'
, and execs the requested script
in /srv/finger/
. This script's output is what the remote finger client gets.
This network service is active now, and one can simply query it with a finger client or by running:
echo | nc the-dam.org 79
to get a list of the available user names.
finger
illustrates how to clear
two hurdles most real life use-case will meet:
- it needs to access data from outside the container,
- it relies on custom software, unavailable from the vanilla version of
guix
provided bylisten
's container.
Accessing data from outside listen
's container is done, as it is in Plan 9, by
mounting 9P servers in the service's namespace.
listen
will look for a /srv/finger/tcp79.namespace
file. This is not, as it
is in Plan 9, a namespace file understood by newns
or addns
(Auth(2) – 9front’s Manual 2024).
It is a proper script, called by listen
, and expected to setup its own
environment, its own view of the filesystem, and its own network before calling
exec $@
, relinquishing control to a command built by listen
.
The default namespace script, /srv/finger/listen.namespace
, called in the
absence of a service-specific script, is quite simple:
#!/gnu/store/aslr8ym1df4j80ika5pfxy5kbfv4iz3w-bash-5.1.16/bin/bash set -euo pipefail # This is the default namespace file for listen services. This is where you # mount the 9P services all your listen services need. exec "$@" # Once done, call the command provided as an argument by listen
The declarative configuration will not overwrite this file if it already exists, making it easy to configure the default namespace in a site-specific manner.
finger
's namespace file contains:
#!/usr/bin/env bash set -euo pipefail mkdir -p /srv/finger f29p unix!/srv/9p/finger /srv/finger & exec "$@"
This mounts the 9P server listening on /srv/9p/finger
to /srv/finger
.
This 9P server daemon is started and managed by shepherd
, Guix System's daemon
manager, thanks to a call to os/9p-serve
in the function that configures
finger
:
(os/9p-serve "/srv/finger/" "/srv/9p/finger" #:name '9p-finger #:user "listen" #:owner "listen" #:group "listen" #:mode "700")
This call creates a socket on /srv/9p/finger
. On the socket listens an
instance of p9ufs
, a modern Golang 9P2000.L implementation. This process is
owned by listen
, and so is the socket.
On Plan 9, namespace operations fail or succeed thanks to the authentication
mechanism embedded in 9P. This authentication mechanism relies on the factotum
process mounted on /mnt/factotum
having the required credentials (Cox et al. 2002).
Despite a previous attempt at porting this mechanism to Linux (Klein and Gette 2023),
this mechanism is not readily available to listen
. Instead, to control the
operations on files outside of the service container, listen
relies on the
ownership of the p9ufs
process and the ownership and permissions of the socket
file.
In this particular case, listen
gets a read-only by default (one has to pass
#:read-write "1"
to os/9p-serve
to get read-write access) view of
/srv/finger
, with the same rights as it would have outside of the container. In
the next section we will see an example of user git
"lending" its read rights
on /srv/git
to listen
(which can't read /srv/git
whose owner, group, and
mode are git:users r-xrwx---
).
This choice makes it possible for e.g. alice
to provide finger
information
to her servermates only, who, belonging to the users
group, may call her
script while logged-in, whereas anonymous users from the internet, being
confined to listen
's identity, won't see her script as executable:
ls -l /srv/finger/alice -rwxr-x--- 1 alice users 121 Mar 18 10:57 /srv/finger/alice
Using custom software is done with a Guix profile. While on Plan 9 one makes
software accessible by binding various directories over /bin
, the story is
more involved on Linux. Due to constraints imposed by ubiquitous dynamic
linking, interpreted languages, and mostly-standardized-but-not-quite defaults
paths, one has to rely on a myriad of search-paths. search-paths are
environment variables that tell the software where to find the resources it
needs.
guix
can setup profiles (Courtès 2018) that are links to its immutable,
content-addressed, store. In it are synthetic directories with links to all
the needed resources (them, too, living in the store), as well as a profile
script that will set the search-paths to those resources.
listen
will load the profile in /run/listen/tcp79/profile
before calling the
service namespace script and the service script. A default profile is used if
/run/listen/tcpXXX/profile
does not exists.
In the case of finger
, the profile is created by a call to os/profile
:
(os/profile "/run/listen/tcp79/profile" #:name 'finger-profile #:packages '("python" "the-dam-org-f29p"))
which sets up two packages: the f29p
9P FUSE client, and the Python
interpreter.
6.3. The git daemon
As revealed by an analysis of the fourth edition source code, 9front's source code, and a question asked on 9fans (“Content of Your /Rc/Bin/Service or /Dis/Svc ? – 9fans Thread” 2024), Plan 9 relies, for the following protocols (as well as a few others), on servers reading and writing on the standard streams: ftp, (ssh), telnet, smtp, http, pop, imap, samba, rlogin, lp, 9P (of course).
Some of those servers, such as rc-httpd
have been ported to BSD or Linux, but
others rely for example on /net
(Presotto and Winterbottom 1993), which does not exist
(yet ?) on Linux.
On the-dam.org, git repositories are made available through a user called git
,
as which anybody can connect through ssh, e.g.:
git clone git@the-dam.org:listen
However, a bug (“Can’t Clone a Git Repo over Anonymous SSH –- Issues.Guix.Gnu.Org” 2023) prevents Guix from using this kind of access, and so we must open port 9418 to let git use its own unauthenticated, unencrypted protocol (but Guix checksums the code after fetching so those two points are not that big of a deal).
Git does provide an easy way to run a server speaking this protocol: just run
git daemon
. However, git daemon
binds to port 9418, and does not use the
standard streams.
One could extract the git protocol from the code and create a standalone standard stream network transparent server, but that is a lot of work.
What we do instead is abuse the namespace script. Instead of relinquishing
control to a command crafted by listen
, here is what tcp9418.namespace
does:
#!/usr/bin/env bash set -euxo pipefail mkdir -p /srv/git f29p unix!/srv/9p/git /srv/git & ip link set lo up git daemon --base-path=/srv/git --export-all& exec socat -dd UNIX-listen:/run/socket,fork TCP-CONNECT:127.0.0.1:9418
Lines 3 and 4 play the same role as for finger
: making the necessary data
available to the service by mounting a 9P server.
Said 9P server is started thanks to this line in the operating system configuration
function listen/git-daemon
:
(os/9p-serve "/srv/git/" "/srv/9p/git" #:name '9p-git #:user "git" #:owner "listen" #:group "listen" #:mode "700")
Here we see that only user listen
can use the socket to mount the 9P server,
but the process listening on the other end is not owned by listen
but by
git
. This is because listen
is not privileged enough to read the actual
/srv/git
. By making user git
run the 9P server, listen
gets exactly the
kind of access it needs, on a very specific and well-defined subset of the global
filesystem.
Line 5 of the git daemon namespace script sets up a loopback network interface.
Behind the generic "container" term hide Linux' namespaces (same name as Plan
9, different feature). Important here is a Linux network namespace, that makes a
process believe it owns all network interfaces (it does, but not the actual
ones). Setting a dummy loopback interface on line 5 lets us start git daemon
on line 6, where it will happily listen on port 9418, blissfully unaware that
nobody but its parent can see the interface it is listening on.
Instead of calling exec $@
as expected, the namespace script instead execs
socat
in such a way than any connection to the /run/socket
socket is
forwarded to the container's port 9418. Creating this socket is the signal
listen
is waiting for before it starts forwarding incoming data from the
internet to the service. Failure to make it appear within a handful of second is
a sign that the service failed to start.
Usually listen
calls socat
inside the container like so:
socat -dd UNIX-listen:/run/socket,fork EXEC:/srv/$service
where /srv/$service
is of course the service script, which will communicate
with the internet through its standard stream.
By hijacking the expected behaviour and calling socat by itself,
tcp9418.namespace
can forward data to git daemon
's port instead.
As for the tcp9418
script, it is just a link to true
, it is never called
anyway, it is only used so that by detecting its presence listen
will start
the service.
To make git
, ip
, etc. available to the service,
/run/listen/tcp9418/profile
is created by a call to:
(os/profile "/run/listen/tcp9418/profile" #:name 'git-profile #:packages '("git" "iproute2" "coreutils" "bash" "socat" "the-dam-org-f29p"))
6.4. To recap
From the user's point of view, service scripts are fully network transparent. They read data on their standard input and write answers on their standard output. One instance of the script will be launched for each new connection.
Service script may specify software dependencies by setting a Guix profile in
/run/listen/tcpXXX/profile
, either manually (by calling guix install
--profile=/run/listen/tcpXXX/profile ...
) or with the declarative operating
system configuration function os/profile
.
Service scripts are run in a container fully isolated from the rest of the
system. To access data from the outside, one mounts a 9P server in the service's
companion namespace script. This 9P server can be started manually (calling
p9ufs
), or through the use of the declarative operating system configuration
function os/9p-serve
.
Abusing the service namespace script is possible to let non-network transparent
servers run without having to rewrite them to use the standard streams. This
facility allows typical web-applications to be run with no modifications.
the-dam.org
redirects https://alice.the-dam.org
to alice
's first port in
her range, and does so for all users, making web application deployment easy,
and rootless.
6.5. To go further
Additional configuration abstractions are available, for example the
unprivileged counterparts to os/profile
and os/9p-serve
are home/profile
and home/9p-serve
. They allow non-root users to run their own 9P servers and
create their own profile, so that the service script they own can access them
without root having to intervene. Describing them is outside the scope of this
document, but they are available to the-dam.org
users.
Also, the register-listen-service
function is available to inform the
shepherd
that some daemons (typically, the 9P servers) shall be started before
listen
.
The bareness of the container in which the service script runs provides two main advantages:
- It increases security by isolating the service script from the rest of the system. An attacker gaining remote code execution in the container has no obvious way to impact the rest of the system except a denial of service through resource exhaustion (e.g. a fork bomb.).
- It forces the explicit declaration of all used resources (software dependencies, files) and processes (in our example, the git server). This explicitness makes it trivial to run the exact same script in a slightly different container for development, testing, or failover purposes.
7. How listen
works
7.1. Configuration
listen
will loop over the files in /srv/listen/
whose name matches tcpXXX
where XXX
denotes a port number.
When it finds an executable file,
for example /srv/listen/tcp7
,
listen
:
- creates an isolated container in which it:
- activates the service's profile at
/run/listen/tcp7/profile
if it exists, or the default one at/run/listen/profile
otherwise, - runs the service namespace script
/srv/listen/tcp7.namespace
if it exists, the default namspace script/srv/listen/listen.namespace
otherwise, - runs
socat
to link the/run/listen/tcp7/socket
socket to the service script/srv/listen/tcp7
standard streams, - logs the standard error in
/var/log/listen/tcp7/service.log
. - Then outside the container, it launches a daemon that will listen on port 7,
and forward data back and forth to and from the service script (via
/run/listen/tcp7/socket
).
listen
uses inotify
to watch the /srv/listen/
dir for changes.
When notified of a change, listen
scans the whole directory again:
listen
leaves alone running daemons- whose service script remains executable,
- and whose content has not changed since it started66: To achieve this,
listen
caches the hash of the script at start time, and compares the current hash with the cached one during the new scan. ,
listen
restarts running daemons- whose service remains executable,
- but whose content changed since they started,
listen
stops every running daemon whose service script has lost its execution bit.listen
starts every service script that has become executable.
7.2. First security layer: almost unprivileged user listen
The listen
process runs owned by nobody
-like user listen
.
This protects user data
from unauthorized reads or writes
should listen
become compromised or misbehaving.
As a second line of defence,
we would have liked listen
to run in a container,
seeing of the system only what its job requires
77: Note that Plan 9 or Inferno make this trivial :
- GNU Guix's paths:
- the read only store
/gnu/store/
where GNU Guix installs all software, - the
/var/guix/
dir to communicate with the guix daemon,
- the read only store
- the service scripts in
/srv/listen/
, - the log directory in
/var/log/listen
, - a runtime directory in
/run/listen/
, - the host's network.
Alas, Linux mirrors our low-growth,high-capital-returns economy: privilege
remains an inherited property(Piketty and Zucman 2015). Despite Linux namespaces
and capabilities, this inheritance model prevents unprivileged access to
non-file resources like ports from a container [personal communication with the
kernel developers]. One example: Linux does not honor setuid
or setcap
binaries within containers. Solving this issue would require
- either moving away from the inheritable
root
-owns-everything model, or its somewhat finer-grained capabilities substitute, - or exposing non-file resources like ports as files.
We must for now fall back on basic Linux inter-process isolation.
To prevent any other user
from messing with the ports
listen
manages,
we set all ports as privileged.
Despite their complexity and coarse-grainedness,
we make use of capabilities:
user listen
can remain wholly unprivileged but for its ability to bind to all ports.
A good step in the right direction from inetd
mandatorily running as root
!
We use Linux capabilities to grant user listen
the right to bind to privileged ports (i.e. all the ports),
using a setcap
-ed wrapper named with-cap-bind
,
that only user listen
can execute.
7.3. Second security layer: service script containerization
Despite listen
having quite a low footprint on the system,
it remains a sensitive target:
- it can access the internet,
- it can listen on any port,
- it can kill or mess with existing service script processes.
Yet, the service script processes constitute
a more probable target than listen
itself.
Indeed they do not just pass data around as listen
does:
they handle this untrusted user data.
Pirates use complex operations on untrusted data,
e.g. parsing, as a typical attack route.
Because service scripts present a big attack surface, they must run in an environment so isolated that a full takeover would not negatively impact the rest of the system much.
To this effect they run in containers.
GNU Guix's guix shell
creates the service containers.
They only expose:
- GNU Guix paths,
- the service-specific (e.g.
/run/listen/tcp7
) runtime directory, - the service-specific (e.g.
/var/log/listen/tcp7
) log directory,
Note that the script can not access the network !
Communication with the outside happens
through the UNIX domain socket /run/listen/tcp7/socket
.
7.4. Passing data to the unnetworked daemon
The /srv/listen/tcp*
service scripts get their data via the
/run/listen/tcp*/socket
88: Note that this path appears as /run/socket
to the service script, see figure 4. socket.
The brilliant utility socat
acts
as plumbing between any two kinds of process input/output
(sockets, files, standard input/output/error, etc.).
For e.g. port 7, listen
calls it this way:
/run/setuid-programs/with-cap-bind socat -dd \ TCP4-LISTEN:7,fork,reuseaddr \ UNIX-connect:/run/7/socket \ >> /var/log/tcp7/listen.log 2>&1 &
socat
runs with the following wrapper, arguments, and options:
with-cap-bind
- grants
socat
the right to bind to privileged ports (see section 7.2). -dd
- enables logging the remote IP addresses.
This then feeds tools like
fail2ban
to react against abuse. fork
- connect to the UNIX domain socket each time a client connects to the tcp socket.
reuseaddr
- the
socat
daemon might die with its TCP socket on port 7 in theTIME_WAIT
state. With the socket in this state, no newsocat
daemon can start again: it would get an "address already in use" error. Going out of theTIME_WAIT
state can take up to 4 minutes. To avoid such a delay,socat
ignores theTIME_WAIT
state thanks to thereuseaddr
option.
Interested readers will find an explanation
of the TIME_WAIT
state
in the UNIX socket FAQ.
7.5. Aliases
Port numbers make for a terrible user interface. To use service names instead, one just has to symbolically link the service name to the service script, e.g.
ln -s /srv/listen/finger /srv/listen/tcp79
This allows the owner to edit /srv/listen/finger
to change the finger service,
instead of having to remember that finger clients connect to port 79.
It also allows authenticated users to start the script by name:
ssh alice@the-dam.org /srv/listen/finger
The script will have an effective user id of alice
99: Unless the script's author has set its setuid bit. ,
and might therefore give more information than the anonymous version accessible to anyone.
By contrast, listen
starts the anonymous version
with an effective user id of user listen
,
who can not read much on the system.
8. Alternatives to listen
8.1. Plan 9's listen and Inferno's svc
A study of the source code of the fourth edition of Plan 9, 9front, and inferno,
as well as asking for listen
service script examples of 9fans (“Content of Your /Rc/Bin/Service or /Dis/Svc ? – 9fans Thread” 2024)
make some usage patterns emerge.
Namespace files: while in the default installation few services appear to have
a custom tcpXXX.namespace
file, some services like ftp and http do. The others
use the site-specific default namespace file /lib/namespace
. These put the
files to be served where the service expect them, without any complex software
configuration mechanism.
Seamless authentication: On Plan 9 and inferno, a user's identity spans a whole
cluster, not just a single machine. Authentication is baked into the 9P
protocol, and requires either less than 15 lines of boilerplate, or direct
handling by some standard middleware. Switching a process' identity, say from
alice
to bob
, requires the caller to prove to the auth
server that it
knows bob
's secret. This is automagically handled by the factotum
at both
ends of the connection. The namespace file benefit implicitly from that
mechanism, provided the currently mounted factotum
knows the required secrets.
Simple scripts: Only the two simplest services (the echo
and discard
protocols) in the default installation are fully implemented as shell scripts.
The rest are custom servers written usually in C for Plan 9 and Limbo for
inferno, or thin shell wrappers around those custom servers. Reports from the
9fans mailing list, however (“Content of Your /Rc/Bin/Service or /Dis/Svc ? – 9fans Thread” 2024), praise the ease with which one can write
a custom network service for a specific use case, such as collecting vac
hashes after a backup, a text-based zine, a connection helper to some MUDs, etc.
/net
usage: The synthetic /net
hierarchy (Presotto and Winterbottom 1993)
is wildly used among the listen
service scripts, which get it as an argument.
As with all the resources on Plan 9, this part of the filesystem can be exported
and shared, in total or in part, between different processes on different
machines.
We want to transpose the patterns to Linux, despite the lack on integration of two of the basic primitives of the Plan 9 experience:
- per-process view of the filesystem (namespaces),
- unprivileged modification of said view.
This lack of integration on Linux was compensated by
- the use of Guix containers (which offer a per-container view of the filesystem, see section 7.3),
- and our new
f29p
FUSE 9P client, which allow for unprivileged mounting of 9P2000.L servers from within a container, something that was absolutely impossible before, see section 2.2 1010: While technically true, this sentence disregards the fact that some couples of server/clients will work from within a container, see the aside in section 2.2. .
Namespace scripts replace namespace files, as there are no newns
or addns
on Linux.
Process and file ownership and permission make a poor, but workable,
substitute to the seamless 9P authentication mechanism of Plan 9. As detailed in
section 6.3, one can run a 9P server as one identity, and use the socket's
ownership and permission to allow another user to benefit from this identity's
rights. The alternative, which inetd
uses, would be to run as root
.
/net
is replaced by a single socket from which the service script can talk
to the incoming connection. A Linux network namespace isolates the service
script from the host's network. Compromises made due to the complex and mostly
undocumented interactions between Linux namespaces and Linux capabilities (see
section 9.2) make porting /net
to Linux a very worthwhile target to
improve listen
.
8.2. (x)inetd
On Linux and BSD, inetd
used to be the tool that would, like listen
, launch
a network service when a client connects and forward data back and forth between
the connection and the service's standard streams.
inetd
has fallen out of fashion:
- most servers have grown in complexity, they do request management on their own.
inetd
listens on behalf of several servers, these servers start on-demand when a request comes. This saves CPU and memory, but increases latency, and costs one process per request.- The CPU and memory gains decreased to insignificance because of decreasing hardware costs.
- Conversely, from a business perspective, the cost of latency has gone up.
- Processes cost more than threads (on Linux).
Self-managed servers (such as e.g.
nginx
) use a single process (sometimes one process per core) and handle each request in a thread. This makesinetd
more expensive under load than a single more complex server. - Nowadays, people use containers (with
podman
,docker
,kubernetes
, etc.). Containers typically simulate a whole operating system to run one or a small handful of services. Container orchestration software fulfillinetd
's role and more. - Like most good things from UNIX,
inetd
's role now eschews tosystemd
, the cancer that will eat Linux's userspace. Almost all modern distributions lack aninetd
(except for a handful of indomitableGaulssensible Linux distributions like GNU Guix). BSDs have kept their sanity and their versions ofinetd
.
inetd
reads its configuration from a text file,
typically /etc/inetd.conf
.
Editing this file requires privileged access.
This leads to difficulties in a multi-user context. listen
addresses this.
inetd
also provides each service process with a complete view of the system,
limited only by the ownership under which said service process starts.
In contrast, listen
places security first
by running the service scripts in a initially void namespace.
8.3. Authbind
authbind
exists since 1998.
It allows access to privileged ports on a per-user and per-port basis.
It does so by masking an application's call to bind
.
The application ends up calling authbind
's version,
which uses a privileged program to call the real bind
in the application's stead.
8.4. Systemd
systemd
offers to run ephemeral services, as inetd does. However using this
functionality means you also have to chose systemd for everything else the
systemd developers bullied their way into (including, but not limited to: your
login manager, your desktop environment, your DNS, your logs).
9. Further work and position
9.1. Unprivileged 9P mounts
When we begin this adaptation work, we naively believed that Linux capabilities, containers, and the Linux kernel's support of 9P would allow us to emulate Plan 9's per-process namespace on Linux.
The first bad surprise happened when we discovered that within a Linux mount namespace, one can only mount a handful of filesystem types, the only non trivial of which is FUSE. 9P is excluded. We therefore had to write a 9P-to-FUSE wrapper (see section 2.2) instead of relying on the kernel's v9fs.
We would like to work in the direction of a patch that would allow in-container unprivileged mounts of 9P filesystems on Linux.
9.2. Porting /net
, to avoid mixing capabilities and Linux namespaces
Another bad surprise happened when we discovered that, first setuid binaries,
and worst, setcap binaries, had no effect in Linux user namespaces. After
emailing the kernel developers themselves we got confirmation that what we
wanted to achieve: use a setcap
binary to grant listen
the ability to bind
to privileged ports from within a Linux user namespace, was impossible.
This makes it impossible to run listen
in its own namespace, as we initially
wanted to do (see section 7.2), instead it gets a full view of the
system, limited only by its identity.
Porting /net
to Linux, would allow us to run listen
in a container in which
the host's /net
is 9P-imported, and stop caring the Linux capabilities feature
of which was mildly said that "coherence in its design and implementation are not
particularly evident"1111: https://lwn.net/Articles/632520 .
9.3. Pledge
After running their initialization code, and just before exec-ing into
listen
's provided command, namespace scripts should freeze their namespaces,
and prevent the service script from doing anything not strictly necessary to
answer the clients' requests. One way to do that is to port OpenBSD's pledge
to bash. Justine has written an awesome port of pledge() to Linux. We wish to
incorporate it in a loadable bash builtin, which would allow pledges to happen
mid-process in bash, and not just as a wrapper to a command.
9.4. Tighten the capable wrapper
listen
's ability to bind on privileged ports hinges on the with-cap-bind
wrapper.
As of this wrapper supports any command.
Its only use consists of a socat
invocation
that redirects data from a port to a service container's /run/socket
:
/run/setuid-programs/with-cap-bind "$(which socat)" -dd \ TCP4-LISTEN:"$port",fork,reuseaddr \ UNIX-connect:/run/listen/"$service"/socket \ >>/var/log/listen/"$service"/listen.log 2>&1 &
We should make the port number the only argument to the wrapper,
and bake the socat
invocation into the wrapper.
That way, when an attacker gains remote code execution as user listen
,
it can not bind arbitrary services to any port
unless it also gains write rights to /srv/listen
(which user listen
does not have).
9.5. Listen services
Our work on listen
started when looking for a clean way to write a finger
server and getting frustrated that none existed. Our finger
service being
operational, our sights are now set on new protocols aiming for simplicity like
gemini and nostr.
9.6. Preventing resource exhaustion
As mentioned in section 6.5, an attacker taking over a service process could exhaust the system's resources. We are working on a fix to that, but Linux' API for resource limits has complex interactions with Linux namespaces, that we do not fully understand yet.
9.7. Down with port numbers
Securing the listen
process proved the most frustrating part
of the design work.
For example: the question "how do unprivileged users bind to privileged ports ?" admits no good answer, because the question itself is wrong on so many levels:
- Port number themselves stem from TCP emerging from earlier protocols
(see the early RFCs 322, 349, 433 and those that obsolete them), and a clean design
would probably elect to eschew them, leveraging a \(2^{128}\) address space to
allow process-to-process communication, instead of the route-to-host, then
route-to-process dance we do know.
- The host to process frontier should be an implementation detail on the receiving end, not baked so deeply in the stack.
- This barrier may even change from request to request as new hosts come up or down depending on load.
- This already happens anyway with e.g. kubernetes, but we would have less cruft if it was baked into the protocol.
- We should use strings instead of numbers to specify the protocol.
- Apart from ease of implementation and historically underpowered hardware, why does one need to specify privileged ports as a single range ? One should be able to restrict access on a per-port basis.
- Even then, why should one manage ports as a monolithic entity that either all users, or no
user but
root
, can access ? - Ports should benefit from a fine-grained access control API, like files do. Exposing ports as virtual files would allow that.
10. Conclusion
The work herein proposed is but a leaky dam, failing to contain the waterweight of historical legacy.
11. Advertisement
Did you like what you read ?
You can help me write more by:
- renting a guix VPS from me,
- hiring me for a consulting gig: software development, cybersecurity audit and training, cryptocurrency forensics, etc. see my personal page,
- letting me teach you Python, or spreading the word about this course,
- or buying a very, very secure laptop from me.
12. Changelog
- Recognize that other 9P client/server couples work from within a container.