Re: taxonomy of dependencies from post-sysv on 2015-05-14 (supervision)

From: post-sysv <boycottsystemd_at_openmailbox.org>
Date: Thu, 14 May 2015 19:26:13 -0400

(Gah, idiotically neglected the proper destination.)

It seems to me that the domineering philosophy for fault tolerance is,
indeed, to let it crash and let the individual state management of
services coupled with any existing supervisor trees and priority
rankings to do their work.

From my personal observations, the issue of dependencies for OS
services has always been a rather finicky one. If the dependency system
is too primitive, it becomes nearly indistinguishable from service
ordering. If it is too complex, then one must inevitably deal with
scheduling policies and transactions, which in the context of managing
OS processes seems bound for reliability issues.

Of course, your particular example would be made less gruesome simply by
introducing a rate limit on startup failures. This strategy seems to be
employed frequently in launchd setups.

Readiness protocols require application-specific integration, do they
not? Even if it's something as simple as writing to a FIFO. Though I
suppose this is unavoidable in sufficiently complex setups where you'll
have lots of context-specific logic anyway.

On 05/14/2015 05:58 PM, Jonathan de Boyne Pollard wrote:
> Wayne Marshall:
>> Under a supervision framework, failure of a service starting is
>> absolutely ok. (Many novices fail to grasp the elegance of this
>> essential feature.)
>
> ... and novices and non-novices alike fail to grasp its
> unscalability. It may be fine on a hobbyist PC, but on a server in a
> datacentre one gets situations like a program that needs two database
> servers and a message queue broker to be up and ready before it can
> run, which one is running 10 instances of for scalability. 10 client
> programs crashing and restarting over and over whilst rabbitmq-server
> and mysqld are trying to come up do not make for a happy startup. "I
> want", says the system administrator, "my machine to spend its
> precious processor and disc on bringing up the things that everything
> is waiting for, not on repeatedly starting and crashing the things
> that are doing the waiting." Let us not forget the logfile and
> monitoring system noise that the thundering herd approach engenders, too.
>
> Two things make this world more tolerable: early server socket opening
> and readiness protocols. Unfortunately, much "enterprise" software
> has yet to even embrace the former, let alone the latter. But there
> are some promising tiny green shoots. Early server socket opening
> makes clients _block_ rather than _abend_. Readiness protocols fill in
> corner cases that aren't necessarily strictly client-server, and also
> deal with the fact that "up for over N seconds" may or may not mean
> "ready" according to what day of the week it is (i.e. what the system
> activity pattern happens to be at the time).
>
> Wayne Marshall:
>> Note also that in no case is it necessary for a service runscript to
>> try starting dependencies itself -- this is all left to the supervisor.
>
> It need not even be the purview of the service manager. nosh doesn't
> do dependency processing in either the "run" programs or the service
> manager. It does it in the "system-control" program. Dependency
> processing is "policy", the decision of what to start and what to
> stop, in what order, and when. Service management is "mechanism", the
> raw mechanics of service state. With this split, one can even have
> two "policies", system-control and service-dt-scanner, running at the
> same time even. Or someone could come along and write a third, indeed.
Received on Thu May 14 2015 - 23:26:13 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:19 UTC