Re: s6 init-stage1 from Laurent Bercot on 2015-01-06 (supervision)

From: Laurent Bercot <ska-supervision_at_skarnet.org>
Date: Tue, 06 Jan 2015 13:02:46 +0100

On 06/01/2015 09:00, Colin Booth wrote:

> 1. Depending on your initramfs and your on-disk layout you can skip
> mounting proc and sys. I know this is the case with Debian, probably
> true elsewhere as well.

  It all depends on the assumptions that init-stage2 makes, but yes,
now that you're mentioning it, mounting /proc and /sys may be
delayed, as long as none of the very early services need them.
Make sure the login process and interactive root shell do not need
them either, because if init-stage2 fails very early, being able to
log in will make debugging/recovery a lot easier.

> 2. If you aren't starting udev until init-stage2, you'll need to
> manually mknod null and console devices before the "Reopen
> stdin/stdout/stderr" comment.

  That only applies to people who want a static /dev. Most people
will run some flavour of udev, and will probably want to keep the
devtmpfs mounted on /dev, in which case the kernel exports
/dev/null and /dev/console itself. (Probably with the wrong rights,
but they're functional enough to get by until udev runs.)

> 3. You'll need to either symlink /tmp into your tmpfs, mount a tmpfs
> on /tmp as part of init-stage1, or remount / to rw before s6-svscan is
> loaded. Otherwise the catch-all logger won't be able to do its thing
> as written. Same deal with /service, though that one is documented and
> expected.

  Actually, neither of those 3 things are needed for /tmp. :)
  What *is* needed is a writable-by-root-only directory, to store the
information init needs:
  - The scan directory, which must be rw
  - rw places to store the supervise/ and event/ subdirectories of
the service directories, or a copy of the service directories
themselves
  - a rw place for the catch-all logger to run

  /tmp is not ideal for this, for several reasons. One of which is
as soon as stage 2 begins and user stuff runs on the system, creating
files in /tmp isn't absolutely secure anymore, because filenames can
be predicted and DoSsed. Another reason is conceptual: the information
we need to store is not exactly temporary, it's not the throwaway
stuff you'd expect to see in /tmp - on the contrary, it's vital to the
system. So it's very unsightly to put it in /tmp.

  I very much dislike having / read-write. In desktops or other systems
where /etc is not really static, it is unfortunately unavoidable
(unless symlinks to /var are made, for instance /etc/resolv.conf should
be a symlink to /var/etc/resolv.conf or something, but you cannot store,
for instance, /etc/passwd on /var...)
  But on servers and embedded systems, / should definitely be read-only.
Having it read-write makes it susceptible to filesystem corruption,
which kills the guarantee that your machine will boot to at least a
debuggable state. A read-only / saves you the hassle of having a
recovery system.
  So, it should be the admin's choice, and I do not want s6 to force
the admin to mount / rw.

  That is why I'm saying that s6 needs a tmpfs, distinct from /tmp,
made in stage 1. Having a "private" tmpfs allows init to store the
scan directory, the copies of service directories, and the catch-all
logger directory, without impacting the rest of the system.
  Since that tmpfs is needed anyway, /tmp might as well be a symlink
to a public (mode 1777) subdirectory of it: it makes /proc/mounts
cleaner. But it's not a requirement, and /tmp may be mounted as a
separate tmpfs at some point in stage 2.

  If you are reckless, totally insensitive to gracefulness, and you
absolutely cannot deal with creating a tmpfs just for the sake of s6,
you may try to use a subdirectory of the devtmpfs in /dev as an
early root-only read-write place.
  You will now forget I suggested that. *flash*

> 4. If you don't want to have your dev mount in /mnt/tmpfs/dev (mostly
> to keep ps output non-ugly and to kind-of stick to the FHS)

  Eh, the FHS doesn't say that /dev should be a real directory. It can
be a symlink all right. I checked. :P
  Most Linux people will use udev, though, and for them /dev will be a
devtmpfs: a real directory, and a mountpoint.

> 5. I made a few more classes of services for init-stage2 to copy into
> the service directory. Specifically for things that I wanted running
> ASAP and were udev agnostic. Those were: syslogd (using s6-ipcserver
> and ucspilogd), klogd, cron, and udev. Mostly that was because I
> needed udev running (and supervised) before bringing up dbus, and I
> wanted to make sure /dev/log had a reader before I started bringing
> anything up that might not want to talk to stdout instead (openssh,
> I'm looking at you).

  The order in which init-stage2 starts services and interleaves them
with one-shot commands should mirror your dependency graph. This is
where a dependency management system would come in handy; I plan to
work on a program that takes a dependency graph as its input (format
TBD) and outputs a suitable init-stage2 script.

  (Crazy idea brewing. Dependency graph management is a solved problem:
it's exactly what "make" does. So my program could simply translate
the service dependency graph into a Makefile, and make would
output the script. This requires more thought.)

> Everything between the fdclose line and repoening stdin is super
> fragile, and since we've unmounted /dev, it's impossible to boot
> half-way and then start a shell to find out what exactly went wrong.

  I will definitely be working on a s6-init package to automate all
this and make sure the fragile part is as brief as possible. The
really risky stuff is replacing /dev/console under init's nose; for
udev users, this won't even happen, so stage 1 will be practically
safe.

  Thanks for your comments!

-- 
  Laurent

Received on Tue Jan 06 2015 - 12:02:46 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:18 UTC