Re: Thoughts on "First Class Services" from Steve Litt on 2015-04-28 (supervision)

From: Steve Litt <slitt_at_troubleshooters.com>
Date: Tue, 28 Apr 2015 13:31:43 -0400

On Tue, 28 Apr 2015 07:57:31 -0700
Avery Payne <avery.p.payne_at_gmail.com> wrote:

> On 4/28/2015 7:18 AM, bougyman wrote:
> > On Tue, Apr 28, 2015 at 8:38 AM, Avery Payne
> > <avery.p.payne_at_gmail.com> wrote:
> >
> > I guess I don't know what this means, in practice. My child services
> > generally know about the
> > parent in their ./run script and the parent (sometimes) has to know
> > about the children in his
> > ./finish script.
>
> I do the opposite using an optional flag that is initially disabled
> at installation. The short version is, using
> parent-only-knows-child-needs makes the dependency chains
> self-organizing.
>
> The long version is that, in supervision-scripts, each service
> (parent) has a list of immediate needs to be met (children).

Good! I was about to ask the definitions of parent and child, but the
preceding makes it clear.

> It
> attempts to start each child in a subdirectory named ./needs
> (coincidentally, anopa uses a compatible arrangement with ./needs).
> Each child is iterated through and checked to see if it is running;
> if it is not running, it is started, and the next child examined,
> until all children are started. If all children start, the parent
> proceeds to start. If a child fails, the parent writes a message to
> its log saying "my child X failed" and proceeds to spawn-loop.

So what you're doing here is minimizing polling, right? Instead of
saying "whoops, child not running yet, continue the runit loop", you
actually start the child, the hope being that no service will ever be
skipped and have to wait for the next iteration. Do I have that right?

>
> As each child starts, if there is a ./needs in the child, it repeats
> the same behavior as the parent, calling other grandchildren and then
> eventually starting once all of its own dependencies are running.
> The script is the same for parent and children alike, so the behavior
> is meant to be consistent. The process is repeated for each
> grand-child, great-grand-child, etc. until the lowest dependencies
> are running, causing the entire dependency "tree" to be traversed.
> This also means the "tree" can be a "chain" or any other
> arrangement. "Leaf nodes" of that tree can be called more than once
> without harm because if they were already launched elsewhere, the
> start-up of that node is skipped. And finally, I don't need to track
> the entire state of the tree, I only need to track what children are
> needed for any given parent, making the entire thing self-organizing,
> while keeping it simple.
>
> > None of the parents know anything about their children.
> I'm just doing it backwards. If I had definitions for postgresql and
> pgbouncer, I would only add the following soft links:
>
> /etc/sv/app_which_uses_pg/needs/pgbouncer
> /etc/sv/pgbouncer/needs/postgresql
>
> That's pretty much it. Nothing else would need to be added to ./run
> for each service.
>
> > I'd like to see the difference in a code example. I haven't had a
> > chance to dig in to anopa yet enough to see how they couple it
> > mouse loosely. Tj
>
> Here's the current version of run.sh, with dependency support baked
> in:
> https://bitbucket.org/avery_payne/supervision-scripts/src/b8383ed5aaa1f6d848c1a85e6216e59ba98c3440/sv/.run/run.sh?at=default
>

That's a gnarley run script. It's as big as a lot of sysvinit or OpenRC
scripts I've seen. One of the reasons I like daemontools style package
management is my run scripts are usually less than 10 lines.

If I'm not mistaken, everything inside the "if test
$( cat ../.env/NEEDS_ENABLED ) -gt 0; then" block is boilerplate that
could be put inside a shellscript callable from any ./run. That would
hack off 45 lines right there. I think you could do something similar
with everything between lines 83 and 110. The person who is truly
interested in the low level details could look at the called
shellscripts (perhaps called with the dot operator). I'm thinking you
could knock this ./run down to less than 35 lines of shellscript by
putting boilerplate in shellscripts.

Beyond that, do you ever wonder if you're fighting the intended
behavior of runit/daemontools/etc? Their intent, as I've always
understood it, is:

foreach service
        All your ducks in line?
                then run yourself
        else
                yield and we'll get back to you next cycle

You're doing more of a recursive start. No doubt, when there are two or
three levels of dependency and services take a non-trivial amount of
time to start (seconds), yours results in the quicker boot. But for
typical stuff, I'd imagine the old "wait til next time if your ducks
aren't in line" will be almost as fast, will be conceptually
simpler, and more codeable by the end user. Not because your method is
any harder, but because you're applying it against a program whose
native behavior is "wait til next cycle".

You know it's a shame that neither daemontools, nor as far as I know
runit, allows you to declare the order services cycle in. IMHO that
would cure about 90% of the polling. Tests on children would almost
uniformly succeed, resulting in very fast boot times, always assuming
you got the order correct. And I'd imagine a make file containing
dependencies would yield proper order that would need to be changed
only when services are changed. Actually, I'd imagine that unless some
service is misbehaving, simply going in order the first time and then
assuming an indeterminate order after that would be sufficient.

And, as you said in a past email, having a run-once capability without
insane kludges would be nice, and as you said in another past email,
it's not enough to test for the child service to be "up" according to
runit, but it must pass a test to indicate the process itself is
functional. I've been doing that ever since you mentioned it.

Thanks,

SteveT

Steve Litt
April 2015 featured book: Twenty Eight Tales of Troubleshooting
http://www.troubleshooters.com/28
Received on Tue Apr 28 2015 - 17:31:43 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:19 UTC