supervising the supervisor

From: Jeff <sysinit_at_yandex.com>
Date: Fri, 03 May 2019 06:13:00 +0200

from the last replies we have the following possibilities regarding
process #1's supervision capabilities:

- no supervision/respawning (maybe also not handling system shutdown
  at all, too):
  simplifies the process #1 implementation (especially in the latter case),
  supervision can be delegated to a subprocess which also simplifies that
  supervisor's implenation since there is no need for it to handle process #1
  specific duties
  (given there are more than just reaping zombies as the default subreaper
  and successfully starting at least one necessary child process (to which the
  remaining duties are delegated)).
  disadvantage: "incorrect" behaviour when all other processes die, leads to
  a bricked system, deep shit ahead.

- respawning (at least one) given services/daemons, possibly even with log
  output redirection to logger processes (s6-svscan et al)

- a compromise between the above 2 solutions:
  process #1 supervises (i. e. respawns, possibly only under certain conditions)
  at most 2 subprocesses (a real "supervisor") and maybe redirects
  its output via pipe(2) to a separate supervised dedicated logger subprocess.
  
  in that case those child processes should only be respawned under certain
  conditions (respawn throttling maybe, i. e. stop respawning if one of the 2
  repeatedly fails in a certain amount of time). if those conditions are not met
  it should start a single user rescue shell (possible via sulogin) and/or reboot.

  only in case the logger child process repeatedly fails: do not redirect the
  supervisor's output, use our own (possibly opened by the kernel) output fds
  for the supervisor child process (probably the console device) instead
  of the pipe fds.

  it could also be a good idea to close all of process #1 stdio fds and only
  open the console device for output when the need arises.
  this has the advantage that we do not have this device open all the time
  (in case /dev needs to get re/unmounted).

again (as we are at it ;-):

in the last case:
when said "supervisor" is s6-svscan (or perpd for that) it would be helpful
for the process #1 implementor (me) if it could manage its own output logger
via a command line option (akin to dt encore's "svscan") since it saves
him from opening the pipe, comparing terminated child PIDs with an
additional (the logger's) PID, and managing additional emergency situations
caused by the logger's failure himself
(especially since s6-svscan does a lot of additional stuff like catching
signals and running the corresponding scripts anyway).
:PP
Received on Fri May 03 2019 - 04:13:00 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:19 UTC