Fwd: Re: Could s6-scscan ignore non-servicedir folders? from Avery Payne on 2015-01-21 (supervision)

From: Avery Payne <avery.p.payne_at_gmail.com>
Date: Wed, 21 Jan 2015 11:42:17 -0800

Ugh, sorry folks. I keep forgetting to change the address. Forwarded a
copy.

-------- Forwarded Message --------
Subject: Re: Could s6-scscan ignore non-servicedir folders?
Date: Wed, 21 Jan 2015 11:40:58 -0800
From: Avery Payne <avery.p.payne_at_gmail.com>
To: Olivier Brunel <jjk_at_jjacky.com>

On 1/21/2015 9:24 AM, Olivier Brunel wrote:
> I'll have to setup some scripts for different init stages, using
> s6-svscan as stage 2, as you've described elsewhere. But I also want to
> have a system to start (and stop) services in order. I see this whole
> idea of order/dependency is something that is being talked about, but
> currently not supported.
Dependency handling is tricky, not from a nuts-and-bolts mechanical
perspective, but from a getting-it-right perspective. There's older
discussion on the mailing list about this, and the short version is:
yes, it can be done, but if you don't have a good notification
mechanism, it's a very weak guarantee. I've already tackled this with
my current set of scripts using an option flag that allows you to have
chain-dependencies for services. It works fine, but there's no hard
guarantee that it will be consistent.

>
> Furthermore, I want this system of mine to include other kinds of
> services, that is one-time process/scripts that needs to be run once (on
> boot), and that's it. And to make things simpler, I want to have it all
> work together, mixing longrun services (s6 supervised) and oneshot
> services when it comes to dependency/order definition.
>
> So I'll have servicedir of sorts, for oneshot services. And I'm planning
> of having one folder, that I tend to call runtime repository, but that
> would also be the scandir for s6-svscan.
>
> Obviously though, those aren't servicedirs in the s6 meaning, they
> shoudln't be supervised, so I'd like for s6-svscan to check if a folder
> does in fact have a file run, and if not to simply skip/ignore it.
>
> That way I can have all my (longrun & oneshot) servicedirs under one
> parent, and it shouldn't really break anything, since a folder without a
> run file would not be really useful to supervise anyways, as it would
> just try & fail to start it every 10 seconds.
>
> The only case I see would be a folder created, scanned, and only
> afterwards the run script be copied in there. But that sounds like a
> very unlikely/rare scenario. (And in case it happened, one could just
> trigger a rescan to fix it.)
>
> So, what do you think of this? Would you be willing to have s6-svscan
> ignore folders not containing a run file?
One-shot is something I've been mulling over for about two months now.
There isn't a quick and elegant way to just make it work. But I might
have a solution, based on what I am currently thinking about doing in
supervision-scripts. Here's a variant of that thought, for use with
"startup":

1. Create a service definition that has a s6-svscan launch in it. We
will call it "startup".
2. Mark the "startup" definition with a ./down file.
3. Create your separate, sequenced launches in a separate directory that
is not in your /service directory. We will call it "startup-settings".
4. In "startup-settings" place all of your one-shot scripts to be run at
initialization. Mark all of them with ./down files.
5. Use dependency resolution to chain the scripts inside of
"startup-settings" together, with the top-most scripts NOT having a
./down file. Currently nosh allows for this, and s6 will have it in the
near future. You can emulate it with supervision-scripts, but only as a
last resort, as it relies heavily on return codes and ./check scripts.
6. When s6 launches, do a "s6-svc start startup". The "startup"
definition will wake up and launch s6-svscan. The scan process will
bring up all of the definitions that do NOT have a ./down file in
"startup-settings". Keep in mind that "startup-settings" is a separate
directory, but "startup" will be part of the existing supervision tree.
Those definitions in "startup-settings" will turn around and call all of
the others that are marked ./down with "s6-svc once (whatever)" - that
is their sole job in life. Now for the heavy magic: Because the "run
once" definitions are dependency-chained, all of those will launch in
the correct sequence, in parallel. Visualize it as a set of trees, with
the base of each tree as the non-down definition, and each branch or
leaf as a down definition. Each branch, by virtue of dependency, calls
the next branch or leaf that it needs, waiting for its children to be
"up". As each leaf on the tree finishes, the leaf calls itself with
"s6-svc down (itself)", and the ./down file is left as-is. This means
each leaf or branch will clean up after itself and will not re-launch.
Also, as each branch or leaf terminates, the "parent" will receive
notice that its children have finished, and once all leaves/branches
have run, will then run and "do what it needs to do". Eventually the
entire tree will have run in the correct sequence. The beauty is, if
you have *independent* trees (branches and leaves that aren't dependent
on another set of branches and leaves in a different tree), you can
launch them in parallel, and it still "just works".
7. Once all of your trees have completed, "startup" will need to be
notified that all trees have finished, and startup will "s6-svc down
startup" to itself. The entire tree - including any strays that are
wedged and haven't terminated - will come down, and "startup" will
exit. There is a danger here - a poorly written definition may not
complete correctly and signal completion anyways, and leave state that
is inconsistent. But at least you don't have a massive tree of
processes floating around at the end of it. Logging will be highly
recommended during this process to ensure that state was set up and
recorded correctly.
8. You can move on to bringing up the rest of the services "normally".

The supervision-scripts project was NOT meant to deal with system
start-up or shut-down, as those are very specific to your environment.
The project specifically targets how the system will run during "normal"
operation. But I am planning on borrowing this concept for other needs,
until others have figured out how they want to deal with state
management. Using this is a bit process-intensive but it will give me
state management without the lingering supervision processes, or the PID
consumption.

As you can see, dependency resolution is the key, and how it is
implemented will make or break your system. This is just a suggestion,
and I can't speak to how it will play out until I can actually try it.
Received on Wed Jan 21 2015 - 19:42:17 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:19 UTC