Re: logpipe error handling patch from Laurent Bercot on 2014-07-29 (supervision)

From: Laurent Bercot <ska-supervision_at_skarnet.org>
Date: Wed, 30 Jul 2014 00:37:47 +0100

On 29/07/2014 20:59, Jan Pobrislo wrote:
> There are two possible resolutions - either make runsv create new pipe
> when asked to start up again or make it die since it's unable to spawn
> a child.

  Actually, there's only one possible resolution.
  When runsv is told to exit, it kills the service and closes the log pipe,
which is impossible to reopen. At this point, the service is dysfunctional
and there's no way to recover it: so runsv should not be accepting commands
anymore, and close its control pipe instead. Any sv command after "sv x"
should return an error, because the service is not in a state where it can
process commands; it will only be functional again when the logger has died,
runsv has died and runsvdir has restarted runsv.

  runsv could even exit right after sending the SIGTERM to the service, that's
the quick and easy solution that closes both the log pipe and the control
pipe. However, it also means runsvdir will restart it, and the new runsv
instance will try and spawn a new logger when the old one may still be hanging
around, and that could cause conflicts.
  A more advanced behaviour is to have potentially configurable timeouts in runsv
to ensure that the logger finally dies and the service can be restarted:
  - close the log pipe
  - give the logger some grace time
  - send it a SIGTERM (and a SIGCONT, just to be sure. Stopped processes don't die.)
  - give it some more grace time
  - send it a SIGKILL
  - give it a bit more time
  - panic loudly.

-- 
  Laurent

Received on Tue Jul 29 2014 - 23:37:47 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:18 UTC