Re: Process Dependency?

From: Avery Payne <avery.p.payne_at_gmail.com>
Date: Fri, 31 Oct 2014 18:49:27 -0700

Admittedly my posting was hurried.

On Fri, Oct 31, 2014 at 3:34 PM, Laurent Bercot <ska-supervision_at_skarnet.org
> wrote:

> Le 31/10/2014 21:04, Avery Payne a écrit :
>
> Of course, you are right, in the perspective that dbus
>> itself is encouraging a design that is problematic, and that problem
>> is now extending into the issues I face when writing my (legacy?)
>> scripts.
>>
>
> But it's okay to have dependencies !
>

I was picking on dbus because from a practical standpoint, it's a thorn in
my side. Writing the ./run script for lightdm was...traumatic. Nothing
like a flickering screen that prevents you from entering "sv stop lightdm"
because it keeps switching to vt7, then dies, switches to vt7, then dies,
switches to vt7...all because dbus didn't "report back". It was rather
nasty.


> We currently have down -> up
>> -> finish -> down as a cycle. In order for dependencies to work, we
>> would need a 4-state, i.e. down -> starting -> up -> finish -> down.
>> (...)
>>
>
> You are thinking mechanism design here, but I believe it's way too
> early for that: I still fail to see how the whole thing would be
> beneficial.
>

I'm probably twsting things quite a lot. The idea was to have a logical
state (as part of a finite state machine) that we want to be "in", and then
have the software attempt to align the actual state with the logical. Once
aligned, they are "in sync" and considered valid. If they are not in sync,
then keep trying to align. It's probably too simplistic.


> The changes you suggest are very intrusive, so they have to bring matching
> benefits. What are those benefits ? What could you do with those changes
> that you cannot do right now ?


There isn't guesswork with regard to cleanup inside ./finish, which may or
many not need to touch "global state". Now ./finish is always a "local"
cleanup, i.e. it doesn't care about anything else other than cleaning up
after itself, and I don't have to worry about some secondary dependency
that I left running behind.


> So you are subjecting the starting of a process to an externally checked
> global state. Why ?
>

There's some subtlety here. In this schema, the only state tracking done
is "how many times has someone else needed Service X to be up".


> The dependency checking can also
> be auto-generated.
>
> If we fail, we go to finish, where the dependencies are notified
>> that they aren't needed by our process; it's up to either B or B's
>> supervisor (not sure yet) to decide if B needs to go down.
>>
>
> And thus you put service management logic into the supervisor.
> I hate these #blurredlines. :P


Actually I didn't write clearly, and that was my fault. I'll walk it
through and clean up my examples this time. In this example, when I say
service, I mean process, and not "service management".

1. Sysadmin asks Service A to start.

2. The svscan process "sees" that Service A has a ./needs directory.

3. Svscan walks the directory entries for ./needs and "starts" each
symlink, one at a time, as if someone asked that service to start
normally. Each successful start increments a single counter for that
service after-the-fact. The counter-per-service is the only change, and is
the "global state" you are talking about.

4. If a ./needs fails start, or fails a timeout, it kills whatever it was
working on, and then walks backwards through the list of ./needs it just
started (we were just there! we should be able to do this). It decrements
the counter for that entry as it visits it during the walk-back. For each
Service X that it visits, if the counter is zero after decrement, AND
Service X is not marked "wanted up", then svscan signals Service X to shut
down normally through the supervisor associated with it; otherwise it is
left running. At this point everything happens as if Service X was asked
to shut down normally, i.e. ./finish is run, etc.

5. If all of the ./needs are reported as up, then the supervisor for
Service A is started as normal.

That's pretty much it. If we fail, Service A never starts, and svscan can
clean up *by asking all the dependencies to clean themselves up*, using the
existing mechanisms to shut down the service. If we succeed, nothing
changes from what happens already. Process dependency moves out of the run
script and into a location that can "see" all the other processes, rather
than needing a helper to "ask" if a process is up inside of the script.

* * * * *

Ok, yeah, I'm looking at the list now and I can see some objections in it.
The tally table is probably not ideal because it consumes RAM. I'm
picturing a one-time allocation of a block of memory that holds a pointer
(which somehow points to the process of the supervisor for a given service)
and a small integer, times however many process slots you wish to
maintain. So memory consumption went up. Then there is always a risk of
having a de-sync between actual and recorded needs (tally doesn't align
with actual need requests) but I think that risk is very low if svscan is
maintaining its finger on the pulse of everything. Then there is the
question, "how do we report that Service A failed to start because Service
X failed"?

Anyways, it was just an idea. And yes, script generators would solve all
of the above. :)


> Not at all - the changes you suggest would be quite heavy on the
> overlord and the supervisor. Today, overlords can send signals to
> supervisors and that's it; whereas what you want involves real two-way
> communication.


Again, that was my fault for not clearly working through it. In the
example, the overlord has everything needed to bring up the dependencies.

That's it for now. I think I've probably picked your brain enough as-is,
so I'll lay off of suggesting silly ideas in code. Which means all of my
other annoying questions will be about legacy scripts that I'm working on.
 :) Thanks for being patient.
Received on Sat Nov 01 2014 - 01:49:27 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:18 UTC