s6
Software
skarnet.org

Service startup notifications

It is easy for a process supervision suite to know when a service that was up is now down: the long-lived process implementing the service is dead. The supervisor, running as the daemon's parent, is instantly notified via a SIGCHLD. When it happens, s6-supervise sends a 'd' event to its ./event fifodir, so every subscriber knows that the service is down. All is well.

It is much trickier for a process supervision suite to know when a service that was down is now up. The supervisor forks and execs the daemon, and knows when the exec has succeeded; but after that point, it's all up to the daemon itself. Some daemons do a lot of initialization work before they're actually ready to serve, and it is impossible for the supervisor to know exactly when the service is really ready. s6-supervise sends a 'u' event to its ./event fifodir when it successfully spawns the daemon, but any subscriber reacting to 'u' is subject to a race condition - the service provided by the daemon may not be ready yet.

Reliable startup notifications need support from the daemons themselves. Daemons should do two things to signal the outside world that they are ready:

  1. Update a state file, so other processes can get a snapshot of the daemon's state
  2. Send an event to processes waiting for a state change.

This is complex to implement in every single daemon, so s6 provides tools to make it easier for daemon authors, without any need to link against the s6 library or use any s6-specific construct: daemons can simply write a line to a file descriptor of their choice, then close that file descriptor, when they're ready to serve. This is a generic mechanism that some daemons already implement.

s6 supports that mechanism natively: when the service directory for the daemon contains a valid notification-fd file, the daemon's supervisor, i.e. the s6-supervise program, will properly catch the daemon's message, update the status file (supervise/status), then notify all the subscribers with a 'U' event, meaning that the service is now up and ready.

This method should really be implemented in every long-running program providing a service. When it is not the case, it's impossible to provide reliable startup notifications, and subscribers should then be content with the unreliable 'u' events provided by s6-supervise.

Unfortunately, a lot of long-running programs do not offer that functionality; instead, they provide a way to poll them, an external program that runs and checks whether the service is ready. This is a bad mechanism, for several reasons. Nevertheless, until all daemons are patched to notify their own readiness, s6 provides a way to run such a check program to poll for readiness, and route its result into the s6 notification system: s6-notifyoncheck.

How to use a check program with s6 (i.e. readiness checking via polling)

How to design a daemon so it uses the s6 mechanism without resorting to polling (i.e. readiness notification)

The s6-notifyoncheck mechanism was made to accommodate daemons that provide a check program but do not notify readiness themselves; it works, but is suboptimal. If you are writing the foo daemon, here is how you can make things better:

The user who then makes foo run under s6 just has to do the following:

What does s6-supervise do with this readiness information?