Re: Process Dependency? from Laurent Bercot on 2014-10-31 (supervision)

From: Laurent Bercot <ska-supervision_at_skarnet.org>
Date: Fri, 31 Oct 2014 23:34:32 +0100

Le 31/10/2014 21:04, Avery Payne a écrit :

> The reason I was attempting to tackle process dependencies is
> actually driven by dbus & friends. I've stumbled on many "modern"
> processes that need to have A, and possibly B, running before they
> launch. Of course, you are right, in the perspective that dbus
> itself is encouraging a design that is problematic, and that problem
> is now extending into the issues I face when writing my (legacy?)
> scripts.

  But it's okay to have dependencies !
  If there's a problem with D-Bus, it's not that it needs to be up
early because other stuff depends on it. It's that it's way too
complex for what it does at such a low level and it bundles together
functionality that has no business being provided by a bus. But you
cannot blame software for having dependencies - system software is
written to be depended on.

> I haven't had time to think the entire state diagram through but I
> did give it some brief thought. Personally, I see current frameworks
> as potentially having an incomplete process state diagram. Most of
> them have a tri-state arrangment, and I think this is where part of
> the problem with dependencies shows up. We currently have down -> up
> -> finish -> down as a cycle. In order for dependencies to work, we
> would need a 4-state, i.e. down -> starting -> up -> finish -> down.
> (...)

  You are thinking mechanism design here, but I believe it's way too
early for that: I still fail to see how the whole thing would be beneficial.
The changes you suggest are very intrusive, so they have to bring matching
benefits. What are those benefits ? What could you do with those changes
that you cannot do right now ?

> The starting state is where magic happens. During starting, other
> dependencies are notified to start. If they all succeed, we go to
> up.

  So you are subjecting the starting of a process to an externally checked
global state. Why ?
  A process can start 1. when the global service state changes and this
process is now wanted up, or 2. when it has died unexpectedly and the
supervisor just maintains it alive.
  In case 1, there's no need to modify the supervisor at all : the existing
functionality is sufficient. Service management can be done on top of it,
possibly via generated scripts.
  In case 2, the global state has not changed, the process is simply wanted
up and should be restarted. If it is wanted up, then its dependencies are
*also* wanted up, and should be up. Why would you need to perform a check
at that point ? Just restart the process. If something goes wrong, it will
die and try again; it's only a transient state, so it's no big deal.
Heavyweight applications for which it *is* a big deal to unsuccessfully
start up several times can have a safeguard at the top of their run script
that checks dependencies for *this* service. The dependency checking can also
be auto-generated.

> If we fail, we go to finish, where the dependencies are notified
> that they aren't needed by our process; it's up to either B or B's
> supervisor (not sure yet) to decide if B needs to go down.

  And thus you put service management logic into the supervisor.
I hate these #blurredlines. :P

> Looping A
> seems undesirable until you realize that your issue isn't A, it's B.
> And when A fails to start, there should be a notification to the
> effect "can't start A because of B".

  But what are you trying to achieve with this that you cannot already
do ? Why can't you just let A be restarted until it works ?
If A is too heavy, then specific guards can be put in place, but that
should be the exception, not the rule.

  What I can envision is keeping the global "wanted" state somewhere
along with the global "actual" state, and writing primitives, to be
called from run scripts, that block until the needed "actual" sub-state
matches the needed "wanted" sub-state. That way, processes that choose
it can block, instead of loop, until their dependencies are met. But
that's a job for a global dependency manager, not a supervision suite,
and there is still no need to modify the supervisor.

> So you can see, supervisors don't talk to each other, the overlord
> process pretty much stays as it is

  Not at all - the changes you suggest would be quite heavy on the
overlord and the supervisor. Today, overlords can send signals to
supervisors and that's it; whereas what you want involves real two-way
communication. Sure, it's simple communication with a trivial protocol,
but it's still a significant architectural change and complexity
increase.

> Yes, indirectly. Just because A wants to start, but can't because B
> is having its own issues. I see it as separation of responsibilities
> - B has to get itself off the ground to start running, but
> B-the-process doesn't have to be aware of A's needs, that's a problem
> for B's supervisor.

  Yes. So let B do its thing, and let A do its thing, and everything will
be all right eventually. If looping A is a problem, then add something in
A's run script that prevents it from doing so. I believe that's one of the
places where a service dependency manager can plug itself in.

-- 
  Laurent

Received on Fri Oct 31 2014 - 22:34:32 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:18 UTC