s6: service directories

Service directories

A service directory is a directory containing all the information related to a service, i.e. a long-running process maintained and supervised by s6-supervise.

(Strictly speaking, a service is not always equivalent to a long-running process. Things like Ethernet interfaces fit the definition of services one may want to supervise; however, s6 does not provide service supervision; it provides process supervision, and it is impractical to use the s6 architecture as is to supervise services that are not equivalent to one long-running process. However, we still use the terms service and service directory for historical and compatibility reasons.)

A service directory foo may contain the following elements:

An executable file named run. It can be any executable file (such as a binary file or a link to any other executable file), but most of the time it will be a script, called run script. This file is the most important one in your service directory: it contains the commands that will setup and run your foo service.
- It is spawned by s6-supervise every time the service must be started, i.e. normally when s6-supervise starts, and whenever the service goes down when it is supposed to be up.
- It is given one argument, which is the same argument that the s6-supervise process is running with, i.e. the name of the service directory — or, if s6-supervise is run under s6-svscan, the name of the service directory as seen by s6-svscan in its scan directory. That is, foo or foo/log, if foo is the name of the symbolic link in the scan directory.
A run script should normally:
- adjust redirections for stdin, stdout and stderr. When a run script starts, it inherits its standard file descriptors from s6-supervise, which itself inherits them from s6-svscan. stdin is normally /dev/null. If s6-svscan was launched by another init system, stdout and stderr likely point to that init system's default log (or /dev/null in the case of sysvinit). If s6-svscan is running as pid 1 via the help of software like s6-linux-init, then its stdout and stderr point to a catch-all logger, which catches and logs any output of the supervision tree that has not been caught by a dedicated logger. If the defaults provided by your installation are not suitable for your run script, then your run script should perform the proper redirections before executing into the final daemon. For instance, dedicated logging mechanisms, such as the log subdirectory (see below) or the s6-rc pipeline feature, pipe your run script's stdout to the logging service, but chances are you want to log stderr as well, so the run script should make sure that its stderr goes into the log pipe. This is achieved by fdmove -c 2 1 in execline, and exec 2>&1 in shell.
- adjust the environment for your foo daemon. Normally the run script inherits its environment from s6-supervise, which normally inherits its environment from s6-svscan, which normally inherits a minimal environment from the boot scripts. Service-specific environment variables should be set in the run script.
- adjust other parameters for the foo daemon, such as its uid and gid. Normally the supervision tree, i.e. s6-svscan and the various s6-supervise processes, is run as root, so run scripts are also run as root; however, for security purposes, services should not run as root if they don't need to. You can use the s6-setuidgid utility in foo/run to lose privileges before executing into foo's long-lived process; or the s6-envuidgid utility if your long-lived process needs root privileges at start time but can drop them afterwards.
- execute into the long-lived process that is to be supervised by s6-supervise, i.e. the real foo daemon. That process must not "background itself": being run by a supervision tree already makes it a "background" task.
An optional executable file named finish. Like run, it can be any executable file. This finish script, if present, is executed everytime the run script dies. Generally, its main purpose is to clean up non-volatile data such as the filesystem after the supervised process has been killed. If the foo service is supposed to be up, foo/run is restarted after foo/finish dies.
- By default, a finish script must do its work and exit in less than 5 seconds; if it takes more than that, it is killed. (The point is that the run script, not the finish script, should be running; the finish script should really be short-lived.) The maximum duration of a finish execution can be configured via the timeout-finish file, see below.
- The finish script is executed with four arguments:
  1. the exit code from the run script (resp. 256 if the run script was killed by a signal)
  2. an undefined number (resp. the number of the signal that killed the run script)
  3. the name of the service directory, the same that has been given to ./run
  4. the process group id of the defunct run script. This is useful to clean up services that leave children behind: for instance, if test "$1" -gt 255 ; then kill -9 -- -"$4" ; fi in the finish script will SIGKILL all children processes if the service crashed. This is not an entirely reliable mechanism, because an annoying service could spawn children processes in a different process group, but it should catch most offenders.
- If the finish script exits 125, then s6-supervise interprets this as a permanent failure for the service, and does not restart it, as if an s6-svc -O command had been sent.
- If s6-supervise has been instructed to exit after the service dies, via a s6-svc -x command or a SIGHUP, then the next invocation of finish will (obviously) be the last, and it will run with stdin and stdout pointing to /dev/null.
A directory named supervise. It is automatically created by s6-supervise if it does not exist. This is where s6-supervise stores its internal information. The directory must be writable.
An optional, empty, regular file named down. If such a file exists, the default state of the service is considered down, not up: s6-supervise will not automatically start it until it receives a s6-svc -u command. If no down file exists, the default state of the service is up.
An optional regular file named notification-fd. If such a file exists, it means that the service supports readiness notification. The file must only contain a nonzero unsigned integer, which is the number of the file descriptor that the service writes its readiness notification to. (For instance, it should be 1 if the daemon is s6-ipcserverd run with the -1 option.) When a service is started, or restarted, by s6-supervise, if this file exists and contains a valid descriptor number, s6-supervise will wait for the notification from the service and broadcast readiness, i.e. any s6-svwait -U, s6-svlisten1 -U or s6-svlisten -U processes will be triggered.
An optional regular file named lock-fd. If such a file exists, it must contain a nonzero unsigned integer, representing a file descriptor that will be open in the service. The service should not write to that descriptor and should not close it. In other words, it should totally ignore it. That file descriptor holds a lock, that will naturally be released when the service dies. The point of this feature is to prevent s6-supervise from accidentally spawning several copies of the service in case something goes wrong: for instance, the service backgrounds itself (which it shouldn't do when running under a supervision suite), or s6-supervise is killed, restarted by s6-svscan, and attempts to start another copy of the service while the first copy is still alive. If s6-supervise detects that the lock is held when it tries to start the service, it will print a warning message; the new service instance will block until the lock is released, then proceed as usual.
An optional regular file named timeout-kill. If such a file exists, it must only contain an unsigned integer t. If t is nonzero, then on receipt of an s6-svc -d command, which sends a SIGTERM (by default, see down-signal below) and a SIGCONT to the service, a timeout of t milliseconds is set; and if the service is still not dead after t milliseconds, then it is sent a SIGKILL. If timeout-kill does not exist, or contains 0 or an invalid value, then the service is never forcibly killed (unless, of course, an s6-svc -k command is sent).
An optional regular file named flag-timeout-killpg. If such a file exists and a nonzero timeout-kill is defined, then at the end of the timeout, the SIGKILL is sent to the whole process group of the service, instead of to the pid of the service only.
An optional regular file named timeout-finish. If such a file exists, it must only contain an unsigned integer, which is the number of milliseconds after which the ./finish script, if it exists, will be killed with a SIGKILL. The default is 5000: finish scripts are killed if they're still alive after 5 seconds. A value of 0 allows finish scripts to run forever.
An optional regular file named max-death-tally. If such a file exists, it must only contain an unsigned integer, which is the maximum number of service death events that s6-supervise will keep track of. If the service dies more than this number of times, the oldest events will be forgotten. Tracking death events is useful, for instance, when throttling service restarts. The value cannot be greater than 4096. If the file does not exist, a default of 100 is used.
An optional regular file named down-signal. If such a file exists, it must only contain the name or number of a signal, followed by a newline. This signal will be used to kill the supervised process when a s6-svc -d or s6-svc -r command is used. If the file does not exist, SIGTERM will be used by default.
An optional regular file named flag-newpidns. If such a file exists:
- On Linux (and potentially in the future, other systems that implement such functionality): at service starting time, the ./run script will be spawned in a new PID namespace. It will be pid 1 in that namespace.
- On systems that do not support the functionality: the service will fail to start, so do not create this file if you're unsure. (Yes, it is a better behaviour than ignoring the flag. Having the flag be silently ignored on some systems would be very bad.)
A fifodir named event. It is automatically created by s6-supervise if it does not exist. foo/event is the rendez-vous point for listeners, where s6-supervise will send notifications when the service goes up or down.
Optional directories named instance and instances. Those are internal subdirectories created by s6-instance maker in a templated service directory. Outside of instanced services, these directories should never appear, and you should never create them manually.
An optional service directory named log. If it exists and foo is in a scandir, and s6-svscan runs on that scandir, then two services are monitored: foo and foo/log. A pipe is open and maintained between foo and foo/log, i.e. everything that foo/run writes to its stdout will appear on foo/log/run's stdin. The foo service is said to be logged; the foo/log service is called foo's logger. A logger service cannot be logged: if foo/log/log exists, nothing special happens.

Stability

With the evolution of s6, it is possible that s6-supervise configuration uses more and more files in the service directory. The notification-fd and timeout-finish files, for instance, have appeared in 2015; users who previously had files with the same name had to change them. There is no guarantee that s6-supervise will not use additional names in the service directory in the same fashion in the future.

There is, however, a guarantee that s6-supervise will never touch subdirectories named data or env. So if you need to store user information in the service directory with the guarantee that it will never be mistaken for a configuration file, no matter the version of s6, you should store that information in the data or env subdirectories of the service directory.

Where should I store my service directories?

Service directories describe the way services are launched. Once they are designed, they have little reason to change on a given machine. They can theoretically reside on a read-only filesystem - for instance, the root filesystem, to avoid problems with mounting failures.

However, two subdirectories - namely supervise and event - of every service directory need to be writable. So it has to be a bit more complex. Here are a few possibilities.

The laziest option: you're not using s6-svscan as process 1, you're only using it to start a collection of services, and your booting process is already handled by another init system. Then you can just store your service directories and your scan directory on some read-write filesystem such as /var; and you tell your init system to launch (and, if possible, maintain) s6-svscan on the scan directory after that filesystem is mounted.
The almost-as-lazy option: just have the service directories on the root filesystem. Then your service directory collection is for instance in /etc/services and you have a /service scan directory containing symlinks to that collection. This is the easy setup, not requiring an external init system to mount your filesystems - however, it requires your root filesystem to be read-write, which is unacceptable if you are concerned with reliability - if you are, for instance, designing an embedded platform.
Some people like to have their service directories in a read-only filesystem, with supervise symlinks pointing to various places in writable filesystems. This setup looks a bit complex to me: it requires careful handling of the writable filesystems, with not much room for error if the directory structure does not match the symlinks (which are then dangling). But it works.
Service directories are usually small; most daemons store their information elsewhere. Even a complete set of service directories often amounts to less than a megabyte of data - sometimes much less. Knowing this, it makes sense to have an image of your service directories in the (possibly read-only) root filesystem, and copy it all to a scan directory located on a RAM filesystem that is mounted at boot time. This is the setup I recommend, and the one used by the s6-rc service manager. It has several advantages:
- Your service directories reside on the root filesystem and are not modified during the lifetime of the system. If your root filesystem is read-only and you have a working set of service directories, you have the guarantee that a reboot will set your system in a working state.
- Every boot system requires an early writeable filesystem, and many create it in RAM. You can take advantage of this to copy your service directories early and run s6-svscan early.
- No dangling symlinks or potential problems with unmounted filesystems: this setup is robust. A simple /bin/cp -a or tar -x is all it takes to get a working service infrastructure.
- You can make temporary modifications to your service directories without affecting the main ones, safely stored on the disk. Conversely, every boot ensures clean service directories - including freshly created supervise and event subdirectories. No stale files can make your system unstable.

Service directories

Contents

Stability

Where should I store my service directories?