Re: s6 bites noob

From: Laurent Bercot <ska-supervision_at_skarnet.org>
Date: Thu, 31 Jan 2019 20:19:28 +0000

>mkdir test
>s6-svscan --help
>Well, that was surprising and unpleasant. It ignores unknown arguments, blithely starts a supervision tree in the current dir (my home dir), and spams me with a bunch of supervise errors. Ok, kill it.
>
>Next test:
>s6-svscan test

Do you always run programs you don't know in your home directory
with random arguments before reading the documentation? Because if
you do, then yes, you're bound to experience a few unpleasant surprises,
and s6-svscan is pretty mild in that aspect. I think you should be
thankful that it didn't erase all the files in your home directory. :)


>What purpose is served by supervise automatically creating the supervise and event subdirs if there's no run file? It seems to accomplish nothing but errors and confusion. Instead of creating the subdirs, and then barfing on the absence of a run file, why not just create nothing until a run file appears?

It is impossible to portably wait for the appearance of a file.
And testing the existence of the file first, before creating the
subdirs, wouldn't help, because it would be a TOCTOU.

As you have noticed and very clearly reported, s6 is not user-friendly
- or rather, its friendliness is not expressed in a way you have been
lulled into thinking was good by other programs. Its friendliness
comes from the fact that it does not mistake you for an idiot; it
assumes that you know what you are doing, and does not waste code in
performing redundant checks. That's how it avoids bloat, among other
things.

You may find it unpleasant that s6 does not hold your hand. That is
understandable. But I assure you that as soon as you get a little
experience with it (and that can even be achieved by just reading
the documentation *before* launching a command ;)), all the
hand-holding becomes entirely unnecessary because you know what to do.


>The doc for svscan at least says that it creates the .s6-svscan subdir. The doc for supervise says nothing about creating the supervise subdir, though the doc for servicedir does say it.

I agree, the documentation isn't perfect. I'll make sure to add a
note in the s6-supervise page to mention the creation of subdirs.


>Next problem. The doc for s6-svc indicates that
>s6-svc -wu serv/foo
>
>will wait until it's up. But that's not what happens. Instead, it exits immediately.

Right. I know why this happens, and it's not exactly a bug, but I can
understand why it's confusing - and your expectation is legitimate.
So I will change the behaviour so "s6-svc -wu serv/foo" does what you
thought it would do.


> It also doesn't even try to start the service unless -u is also given, which is surprising, but technically not in contradiction of the doc.

Well *that* is perfectly intentional.


>And if -u is given, then -wu waits forever, even after the service is up. In serv/foo/run I have:
>#/bin/bash
>echo starting; sleep 2; echo dying
>
>s6-svc -wu -u serv/foo/ will start it, but never exits. Likewise, s6-svc -wd -d serv/foo/ will stop it, but never exits.

Now that is probably due to your setup, because yours is the only
report I have of it not working. Please pastebin the output of
"strace -vf -s 256 s6-svc -uwu serv/foo" somewhere, and post the URL:
I, or other people here, will be able to tell you exactly what's going
wrong. Also, just in case, please also pastebin your sysdeps
(by default: /usr/lib/skalibs/sysdeps/sysdeps).

>So, I tried s6-rc. Set up service definition dir, compile database, create link, run s6-rc-init, etc, then finally
>s6-rc -u change foo
>
>It starts immediately, but rc then waits while foo goes through 12 to 15 start/sleep/die cycles before rc finally exits with code 0. (And foo continues cycling.) But if I press ^C on rc before it exits on its own, then it kills foo, writes a warning that it was unable to start the service because foo crashed with signal 2, and exits with code 1.

This is directly related to your issue with s6-svc above.
"s6-rc -u change foo" precisely calls "s6-svc -uwu" on foo's service
directory, and waits for it to return. Fixing s6-svc's behaviour
in your installation will also fix s6-rc's behaviour.


>So I tried it again, and this time pressed ^C on rc immediately after running it, before foo had a chance to die for the first time. It reported the same warning! The prophecy is impressive, but still, shouldn't rc just exit immediately after foo starts, and let the supervision tree independently handle foo's future death?

That is normally what happens, except that in your case s6-svc never
returns, so from s6-rc's point of view, the service is still starting.
It's the exact same issue.


>Next test: I moved run to up, changed type to oneshot, recompiled, created new link, ran s6-rc-update, and tried foo again. This time, rc hangs forever, and up is never executed at all. When I eventually press ^C on rc, though, it doesn't say unable to start foo; it says unable to start s6rc-oneshot-runner.

Related to the same issue as well. Oneshots are executed through a
longrun service named s6rc-oneshot-runner, so when you tell s6-rc to
start foo, it starts s6rc-oneshot-runner first, and since s6-svc
never returns, it fails in the same way as before.


>How to bring all up?

The absence of an option to bring up _everything in your database_
is intentional. In the usage I have in mind, the database is added
and substracted to by a distribution's package manager: when you
install a service, you add this service's definition to the database
(and recompile it). That means there can be way more services in a
database than the user ever intends to run at the same time - and
it also means that the definition of "everything" can be pretty
volatile, so having a "bring up everything" command would likely
do more harm than good.

The intended usage is for you to create a bundle explicitly
containing all the services you want to bring up, and to call
s6-rc -u change on this bundle. (You can name the bundle "everything"
if you like.) That way, you know exactly what services you are
starting, no matter what additions are made to the database.

>
>And a question about the advice in the docs. if svscan's rescan is 0, and /tmp is RAM, what's the advantage of having the scan directory be /tmp/service with symlinks to service directory copies in /tmp/services, instead of simply having /tmp/services directly be the scan directory?
>
>I guess an answer might be that there can be a race between svscan's initial scan at system startup and the populating of /tmp/services, so it sees partially copied service directories. But wouldn't a simpler solution be to either delay svscan's start until the populating is complete, or add an option to disable its initial scan?

The problem isn't only the initial scan, but _any_ scan, which can
come at any time via a s6-svscanctl -a command for instance. Even -t0
does not protect you against an admin, or a script, requesting a scan
without you being aware of it.
That's why it is better to make sure that service directories only
appear in a scandir when they are complete - which is achieved by
creating them elsewhere then atomically symlinking them.

You make good points, and I'm sure your initial impression of s6 would
have been better if you hadn't bumped against this weird s6-svc problem.
So, let's solve this fast and soothe the bite. :)

--
Laurent
Received on Thu Jan 31 2019 - 20:19:28 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:19 UTC