Re: logpipe error handling patch from Jan Pobrislo on 2014-07-30 (supervision)

From: Jan Pobrislo <ccx_at_webprojekty.cz>
Date: Wed, 30 Jul 2014 15:01:16 +0200

On Wed, 30 Jul 2014 00:37:47 +0100
Laurent Bercot <ska-supervision_at_skarnet.org> wrote:

> On 29/07/2014 20:59, Jan Pobrislo wrote:
> > There are two possible resolutions - either make runsv create new
> > pipe when asked to start up again or make it die since it's unable
> > to spawn a child.
>
> Actually, there's only one possible resolution.
> When runsv is told to exit, it kills the service and closes the log
> pipe, which is impossible to reopen.

Yes, it's impossible to reopen the very same pipe. What's possible is
to create new pipe for use with newly spawned processes. The existing
ones are shutting down now anyway, or at least are supposed to be, so
indeed we need to do the functional equivalent of runsv restart even if
we do create new pipe.

Note that I don't really like this approach much, just exiting as my
first patch does is much simpler and cleaner imo.

> At this point, the service is
> dysfunctional and there's no way to recover it: so runsv should not
> be accepting commands anymore, and close its control pipe instead.

I was under the assumption that the pipe is left open so you can send
signals in case the supervised processes have trouble exiting the
normal way.

> runsv could even exit right after sending the SIGTERM to the
> service, that's the quick and easy solution that closes both the log
> pipe and the control pipe. However, it also means runsvdir will
> restart it, and the new runsv instance will try and spawn a new
> logger when the old one may still be hanging around, and that could
> cause conflicts.

I'm trying to build my systems so they can recover from any percentage
of killed processes at any time, except of PID1, so this would have
to be handled either way. A good logger here will exit on held locks and
only read it's input when it knows it can write. But indeed it's
unnecessary complication.

Normally one would remove a symlink to the service directory when
sending the x command, so it doesn't get respawned by runsvdir.
It's still possible though for it to be re-added back when marking the
service up again, so it would be possible for such behavior to occur.
To prevent it we either need to exit unconditionally (ignoring any up
or once commands after x) or create new pipe as described above after
both processes exited.

> A more advanced behaviour is to have potentially
> configurable timeouts in runsv to ensure that the logger finally dies
> and the service can be restarted:

You can look at start-stop-daemon in OpenRC to see example of this.
From manpage:

These options are only used for stopping daemons:

-R, --retry timeout | signal/timeout
The retry specification can be either a timeout in seconds or
multiple signal/timeout pairs (like SIGTERM/5).

For runsvdir itself I can see --retry HUP/5/TERM/5 being used in the
default initscript.

You can have configurable shutdown behavior this way, but presumably
this is exactly what should go into service/control/t -- but that
wouldn't work with the log indeed. I wonder if there would be any
benefit making runsv understand similar syntax, but that'd probably
make things just unnecessarily complex.

Only thing I can think of that exiting unconditionally after x
interferes is the o (once) command; I presume that when executing up
and down commands the ./down file is created/removed appropriately so
it doesn't matter if runsv is restarted. The once command is a tricky
one in this regard, but I can't conjure in my mind an use-case where
it'd matter.
Received on Wed Jul 30 2014 - 13:01:16 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:18 UTC