Re: Build Break in s6-rc from Laurent Bercot on 2015-08-14 (skaware)

From: Laurent Bercot <ska-skaware_at_skarnet.org>
Date: Fri, 14 Aug 2015 16:48:52 +0200

On 14/08/2015 01:25, Colin Booth wrote:
> I'm not sure how I feel about having the indestructibility guarantee
> residing in a service that isn't the root of the supervision tree. I
> haven't done much with s6-fdholderd but unless there's some extra
> magic going on in s6rc-fdholderd, if it goes down it won't be able to
> re-establish its control over the overall communications state due to
> it creating a fresh socket. I know, I know, it should be fine, but
> accidents happen.

  I've thought about it for a while, and finally decided that the
advantages overshadowed the drawbacks.

  First, the only time this makes a qualitative difference is when
the pipe maintainer cannot die at all. In one setup, you lose your
pipe when "s6-svscan" dies; in the other setup, you lose your pipes
when "s6-fdholderd" dies. The only way to prevent that is to forbid
your pipe maintainer from dying entirely.

  Second, the only way to do that is to put the pipe maintainer as
process 1; but I don't think putting things in process 1 to make
them indestructible is the answer. It's the systemd way. "We're
process 1, so we cannot die, and we can do everything on the system
that needs reliability."
  Granted, it's a nice thing to have, and I do advocate the use of
s6-svscan as process 1, but not because it's a pipe maintainer. I
use s6-svscan as process 1 because it's the natural place for the
root of a supervision tree; and everything else is a bonus.

  The logged service feature of s6-svscan is a direct legacy of
daemontools. It was very cool at the time because we had nothing
else; and I keep it because there's a large daemontools user base,
and breaking compatibility would not make sense because the code
that handles logged services isn't complex enough to be a
maintenance burden. (And still, it is one of the very few places
where I had to write a detailed comment labelled BLACK MAGIC,
because there *is* some complexity to it.)
  So it's not going away any time soon, but it's still a legacy
ad-hoc functionality. If I was writing s6-svscan today, I would
not implement this feature; I would advertise the use of a
dedicated fd-holder instead. And that would cut the code size of
s6-svscan by a non-negligible amount, getting it closer to the
ideal of the minimal process 1.

  The correct approach to reliability is not to try and force
processes not to die; and it's not to cram more stuff into the
only process that cannot die. It's to make sure it's not a
serious problem when processes die. And that, btw, is exactly
what supervision is about in the first place.

  So, let's make sure it's not a problem when the pipe maintainer
dies. In this case, let's add a watcher for s6-fdholderd.
Instead of oneshots that store pipes into the s6-fdholderd, how
about filling up s6-fdholderd at start time with all the pipes
it needs ? The processes in a pipeline will keep using the old
pipes until one of them dies, at which point the old pipe will
close, propagating the EOF or EPIPE to the other processes in
the pipeline; eventually all the processes in the pipeline will
restart, and fetch the new set of pipes from s6-fdholderd.

  That sounds reliable to me, and even cleaner than the current
approach, where the services can't reliably restart if
s6-fdholderd has died; and it doesn't need additional
autogenerated oneshots. (Thanks for the rubber duck debugging!
That's a huge part of why I like design discussions.)

  So yeah, if s6-fdholderd dies, and one process in a pipeline
dies, then the whole pipeline will restart. I think it's an
acceptable price to pay, and it's the best we can do without
involving process 1.

-- 
  Laurent

Received on Fri Aug 14 2015 - 14:48:52 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:38:49 UTC