s6-rc design ; comparison with anopa from Laurent Bercot on 2015-04-23 (skaware)

From: Laurent Bercot <ska-skaware_at_skarnet.org>
Date: Thu, 23 Apr 2015 17:40:33 +0200

  So, I've been planning to write s6-rc, a complete startup/shutdown script
system based on s6, with complete dependency management, and of course
optimal parallelization - a real init system done right.
  I worked on the design, and I think I have it more or less down; and I
started coding.

  Then Olivier released anopa: http://jjacky.com/anopa/

  anopa is pretty close to my vision. It's well-designed. It's good.
There *are* essential differences with s6-rc, though, and some of them
are important enough that I don't want to immediately stop writing s6-rc
and start endorsing anopa instead.

  This post tries to explain how s6-rc is supposed to work, and how it
differs from anopa, and why I find the differences important. What I hope
to achieve is a design discussion, with Olivier of course, but also other
people interested in the subject, on how an ideal init system should work.

  My goals are to:
  - reach a decision point: should I keep writing s6-rc or drop it ?
Dropping it can probably only happen if Olivier agrees on making a few
modifications to anopa, based on the present discussion, but I don't
think it will be the case because some of those modifications are
pretty hardcore.
  - if I keep writing s6-rc: benefit from this discussion and from
Olivier's experience to avoid pitfalls or designs that would not stand
the test of real-life situations.

  So, on to it.

  Three kinds of services
  -----------------------

  Like anopa, s6-rc works internally with two kinds of services: longrun,
which is simply defined by a service directory that will be directly
managed by s6, and oneshot, which is defined by a directory containing
data (a start script, a stop script, and some optional stuff).

  s6-rc allows the user to provide a third kind of service: a "bundle".
A bundle is simply a set of other services. Starting a bundle means
starting all the services contained in the bundle.
  A bundle can be used to emulate a SysV runlevel: the user can put all the

services he needs into a single bundle, then tell s6-rc to change the machine

state to "exactly that bundle".
  Bundles can of course contain other bundles.

  A oneshot or a longrun are called atomic services, as opposed to a bundle,
which is not atomic.
  Bundles are useful for the user, because "oneshot" and "longrun" are
often too small a granularity. For instance, the "Samba" service is made
of two longruns, smbd and nmbd, but it's still a single service. So,
samba would be a bundle containing smbd and nmbd.

  Also, the smbd daemon itself could want its own logger, smbd-log.
Correct daemon operation depends on the existence of a logger (a daemon
cannot start if its logger isn't working). So smbd would actually be a
bundle of two long-runs, smbd-run (which is the smbd process itself) and
smbd-log (which is the logger process), and smbd-run would depend on
smbd-log.

  Users who want to start Samba don't want to deal with smbd-run, smbd-log,
nmbd-run and nmbd-log manually, so they would just start "samba", and
s6-rc would resolve "samba" to the proper set of atomic services.

  Source, compiled and live
  -------------------------

  Unlike anopa, s6-rc does not operate directly at run-time on the
user-provided service definitions. Why ? Because user-provided data is
error-prone, and boot time is a horrible time for debugging. Also, s6-rc
uses a complete graph of all services for dependency management, and
generating that graph at run-time is costly.

  Instead, s6-rc provides a "s6-rc-compile" utility that takes the
user-provided service definitions, the "source", and compiles it into
binary form in a place in the root filesystem, the "compiled".

  At run-time, s6-rc ignores the source, but reads its data from the
compiled, which can be on a read-only filesystem. It also needs a
read-write place to maintain information about its state; this place is
called the "live". Unlike the compiled, the live is small: it can reside
in RAM.

  The point of this separation is multifold: efficiency (all checks,
parsing and graph generation performed at compile-time), safety (the
compiled can be write-protected), and clarity (separation of user-
modifiable data, current configuration data, and current live data).

  Atomic services can be very small. It can be a single line of shell
for a oneshot, for instance. I fully expect package developers to
produce source definitions with multiple atomic services (and dependencies
between those services) and a bundle representing the whole package.
I expect the total number of atomic services on a typical reasonably
loaded machine to be around a thousand. Yes, it can grow very fast -
so having a compiled database isn't a luxury.

  Run-time
  --------

  At run-time, s6-rc only works in *stage 2*.
  That is important, and one of the few things I do not like in anopa:
stage 1 should be completely off-limits to any tool.

  s6-rc only wants a machine with a s6-svscan running on a scandir. It does
not care what happened before. It does not care whether s6-svscan is
process 1 or not.

  This does not mean s6-rc cannot handle one-time initialization. On the
contrary, my view is that one-time initialization should be deferred to
stage 2 as much as possible, with an absolutely minimal stage 1. For those
who want to run s6-svscan as process 1 (and they're right), I intend to
work on a s6-init package that will provide suitable minimal stage 1s
depending on the OS and user configuration; they will start stage 2 with
s6-svscan running on an empty scandir - save the catch-all logger and
maybe a getty.

  In stage 2, the user should start by running the "s6-rc-init" program,
which is roughly the equivalent of anopa's "aa-enable".
s6-rc-init will initialize the live area, and also start all the
supervisors for all the defined long-run services (so that notifications
work properly later on). Service directories are copied from the compiled
to the live, and initially they all have a down file so the supervisors
are started but not the services. Down files, like the rest of service
directories, are managed directly by s6-rc: one the user relinquishes
her machine state management to s6-rc, she does not tinker manually with
service directories ever again.

  After s6-rc-init has been run, the user can simply invoke the service
management engine, the "s6-rc" program itself. "s6-rc -u servicelist"
will bring up all the services in servicelist. servicelist can contain
bundle names: s6-rc will first resolve everything into a set of atomic
services, then start everything, beginning with the services it needs to
bring up and that have no dependencies. As soon as the dependencies are
solved for a service belonging to the set, s6-rc starts this service.
  s6-rc exits when it has no more services in waiting.

  The s6-rc program itself is pretty small. I have finished writing its code:
the source is less than 25 kB long. All the complexity has been moved to
the data structures - basically to s6-rc-compile. I like the idea that
the main engine, the program that actually starts and stops services and
that the boot process lives by, is small and simple; all the hard stuff
is handled offline.
  Oh, and s6-rc does not use malloc. :)

  If an error occurs, i.e. a start script fails, s6-rc marks this service,
and recursively all that depends on it, as unavailable for this run. It
will keep running until it has started everything it has been asked to
start and that does not depend on the failing service. It then exits
nonzero.
  There is no retry policy. Users can loop around the s6-rc invocation if
they want to implement a retry policy: "s6-rc -u servicelist" is
idempotent if servicelist does not change between invocations.

  A one-shot start script is considered "pending" by s6-rc while it is
running; it is considered successful when it exits zero, and a failure
when it exits nonzero or is killed. A "start" action for a longrun
service is a "s6-svc -U" invocation: it is successful when the daemon is
running and has notified its readiness. A timeout can be defined, just
like with anopa.
  It is possible, though not recommended, for s6-rc to assume autoreadiness
for a longrun service (i.e. start it with "s6-svc -u").

  Symmetry and dependencies
  -------------------------

  Dependencies are provided by the user, in the source; they are tied to
atomic services (a bundle cannot depend on anything).
  An atomic service can depend on any other service. A dependency on a bundle

means a dependency on every atomic service contained in that bundle.
  s6-rc-compile reads all the dependencies and creates the complete DAG in
compiled. Cycles, of course, result in a compilation error.
s6-rc-compile also automatically creates the dual DAG of reverse
dependencies.

  Bringing stuff up (s6-rc -u) and bringing stuff down (s6-rc -d) are
symmetrical. They are handled the exact same way by s6-rc, calling the
"start" or "stop" script for oneshots, or calling "s6-svc -U" or
"s6-svc -d" for longruns, and using either the direct dependency graph
or the reverse dependency graph as needed.

  Live updates
  ------------

  There's a complex thing that anopa more or less evades but that I feel is
necessary in order to be adopted by a distribution: live updates. I'm not
exactly sure yet how to proceed, but I have a vague idea, and I would like
more input on the subject.

  Users will upgrade their packages. They will sometimes need to restart
longrun services. If not much has changed, it's easy: they can do it with
s6-svc without touching the global state, so it's not s6-rc's or anopa's
concern. However, sometimes things change: new daemons are introduced,
new dependencies are introduced, etc.

  My view is that packages should provide source definitions, and after an
update, the distribution should invoke s6-rc-compile again. This is easy
enough, but then the live state does not match the current compiled
service database anymore. anopa has a similar problem with its current
service repository.

  I am thinking about a utility, "s6-rc-update", that would take the live,
the old compiled and the new compiled as inputs, and that would update
the live as smartly as possible, with carefully designed heuristics;
users could also tell s6-rc-update exactly what to do via annotations in
the source, that s6-rc-compile would translate into the new compiled.

  Tricky implementation details
  -----------------------------

  What good is a new init system if it's vulnerable to the old sysvrc
pitfalls ? :P

  One of the main issues with sysvrc is that scripts are run as scions of
the invoking shell. So, a sysvrc script run by boot scripts isn't run with
the same environment as the same script run manually by an admin, and
this is very difficult to harden.

  Supervision suites solved that problem for longrun services. Since
daemons are started by the supervision tree, and never by an admin's shell
or the boot scripts' shell, they are always started with a reproducible
environment.

  But what about oneshot services ? What about "start" and "stop" scripts ?
anopa actually runs them as children of "aa-start" and "aa-stop".
Which *may* be just as problematic. We need a way to run oneshot scripts
in the same reproducible manner as daemons.

  s6-rc does this. Every s6-rc script invocation will be reproducible.
  I'll let you guys think a little about how it does it; I'm both very
proud and very disgusted by the solution. If you manage to guess how
s6-rc does it, it means that your mind is just as warped as mine; but
no matter whether you think that's genius or that's horrible, or both,
it's something anopa does not. :)

  Nice things anopa does
  ----------------------

  There are a lot of nice things anopa does, and that I may shamelessly
copy if Olivier accepts: for instance, all the terminal manipulation.
Progress bars are shiny. :)

  However, I won't add progress bars to s6-rc if it makes it significantly
more complex (read: if it really needs heap memory). And I don't
understand the need for the pipe from aa-start to the start script:
what kind of information does aa-start give its child ? If you remove
that pipe, you can run the start script with the user's stdin, so you
don't have to add noecho support to aa-start - asking for passwords can
be performed entirely by the start script.

  I would also like to hear more about the "wants" dependencies. Is that
a thing ? What does it mean exactly ? And what is the point of
oneshots not marked "essential" ? Generally speaking, anopa separates
service ordering and service dependencies; I would like to hear more
about the goal of that differentiation. Is s6-rc's hard dependency model
("if A depends on B, then B will start first, and A will not start if B
can't be successfully started") insufficient ? Could I have some real-life
examples of this ?

  ... and that was more than long enough for a first post on the subject.
Thanks for having read that far :)

-- 
  Laurent

Received on Thu Apr 23 2015 - 15:40:33 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:38:49 UTC