s6 bites noob from Kelly Dean on 2019-01-31 (supervision)

From: Kelly Dean <kelly_at_prtime.org>
Date: Thu, 31 Jan 2019 18:32:45 +0000

mkdir test
s6-svscan --help
Well, that was surprising and unpleasant. It ignores unknown arguments, blithely starts a supervision tree in the current dir (my home dir), and spams me with a bunch of supervise errors. Ok, kill it.

Next test:
s6-svscan test

It gives errors about supervise being unable to spawn ./run, and the child dying. What? On an empty scan dir? Oh, the previous test's accidental supervision tree ran supervise on all the current dir's subdirs--and each instance of supervise automatically created a supervise subdir of its service dir. So, now there's test/supervise, which svscan now interprets as a service dir, and starts supervise on it, which barfs.

What purpose is served by supervise automatically creating the supervise and event subdirs if there's no run file? It seems to accomplish nothing but errors and confusion. Instead of creating the subdirs, and then barfing on the absence of a run file, why not just create nothing until a run file appears?

The doc for svscan at least says that it creates the .s6-svscan subdir. The doc for supervise says nothing about creating the supervise subdir, though the doc for servicedir does say it.

Next problem. The doc for s6-svc indicates that
s6-svc -wu serv/foo

will wait until it's up. But that's not what happens. Instead, it exits immediately. It also doesn't even try to start the service unless -u is also given, which is surprising, but technically not in contradiction of the doc.

And if -u is given, then -wu waits forever, even after the service is up. In serv/foo/run I have:
#/bin/bash
echo starting; sleep 2; echo dying

s6-svc -wu -u serv/foo/ will start it, but never exits. Likewise, s6-svc -wd -d serv/foo/ will stop it, but never exits. supervise itself does do its job though, and perpetually restarts run after run dies while the service is set to be up.

So, I tried s6-rc. Set up service definition dir, compile database, create link, run s6-rc-init, etc, then finally
s6-rc -u change foo

It starts immediately, but rc then waits while foo goes through 12 to 15 start/sleep/die cycles before rc finally exits with code 0. (And foo continues cycling.) But if I press ^C on rc before it exits on its own, then it kills foo, writes a warning that it was unable to start the service because foo crashed with signal 2, and exits with code 1.

So I tried it again, and this time pressed ^C on rc immediately after running it, before foo had a chance to die for the first time. It reported the same warning! The prophecy is impressive, but still, shouldn't rc just exit immediately after foo starts, and let the supervision tree independently handle foo's future death?

Next test: I moved run to up, changed type to oneshot, recompiled, created new link, ran s6-rc-update, and tried foo again. This time, rc hangs forever, and up is never executed at all. When I eventually press ^C on rc, though, it doesn't say unable to start foo; it says unable to start s6rc-oneshot-runner.

This is all with default configuration for skalibs, execline, s6, and s6-rc, sourced from Github, running on Debian 9, in my home directory as a non-root user (with -c option for rc-init and -l for rc-init, rc, and rc-update, to avoid polluting system dirs while testing).
s6-rc doesn't understand a --version option, but s6-rc/NEWS says 0.4.1.1. And s6/NEWS says 2.8.0.0.

And there appears to be an option missing for s6-rc:
s6-rc -d list # List all
s6-rc -a list # List all up
s6-rc -d change foo # Bring foo down
s6-rc -u change foo # Bring foo up
s6-rc -da change # Bring all down
How to bring all up? The examples above suggest it would be
s6-rc -ua change
But that does nothing. (And the doc does indicate that it would do nothing, since there's no selection.)

And a question about the advice in the docs. if svscan's rescan is 0, and /tmp is RAM, what's the advantage of having the scan directory be /tmp/service with symlinks to service directory copies in /tmp/services, instead of simply having /tmp/services directly be the scan directory?

I guess an answer might be that there can be a race between svscan's initial scan at system startup and the populating of /tmp/services, so it sees partially copied service directories. But wouldn't a simpler solution be to either delay svscan's start until the populating is complete, or add an option to disable its initial scan?

With no initial scan, then you have to run svscanctl -a after the /tmp/services populating is complete, but you have to run that anyway even if you're using symlinks from a separate /tmp/service directory.
Received on Thu Jan 31 2019 - 18:32:45 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:19 UTC