Re: Preliminary version of s6-rc available from Laurent Bercot on 2015-07-16 (skaware)

From: Laurent Bercot <ska-skaware_at_skarnet.org>
Date: Fri, 17 Jul 2015 00:16:00 +0200

On 16/07/2015 19:22, Colin Booth wrote:
> You're right, ./run is up, and being in ./finish doesn't count as up.
> At work we use a lot of runit and have a lot more services that do
> cleanup in their ./finish scripts so I'm more used to the runit
> handling of down statuses (up for ./run, finish for ./finish, and down
> for not running). My personal setup, which is pretty much all on s6
> (though migrated from runit), only has informational logging in the
> ./finish scripts so it's rare for my services to ever be in that
> interim state for long enough for anything to notice.

  I did some analysis back in the day, and my conclusion was that
admins really wanted to know whether their service was up as opposed
to... not up; and the finish script is clearly "not up". I did not
foresee a situation like a service manager, where you would need to
wait for a "really down" event.

> As for notification, maybe 'd' for when ./run dies, and 'D' for when
> ./finish ends. Though since s6-supervise SIGKILLs long-running
> ./finish scripts, it encourages people to do their cleanup elsewhere
> and as such removes the main reason why you'd want to be notified on
> when your service is really down. If the s6-supervise timer wasn't
> there, I'd definitely suggest sending some message when ./finish went
> away.

  Yes, I've gotten some flak for the decision to put a hard time limit
on ./finish execution, and I'm not 100% convinced it's the right
decision - but I'm almost 100% convinced it's less wrong than just
allowing ./finish to block forever.

  ./finish is a destroyer, just like close() or free(). It is nigh
impossible to define sensical semantics that allow a destroyer to fail,
because if it does, then what do you do ? void free() is the right
prototype; int close() is a historical mistake.
  Same with ./finish ; and nobody tests ./finish's exit code and that's
okay, but since ./finish is a user-provided script, it has many more
failure modes than just exiting nonzero - in particular, it can hang
(or simply run for ages). The problem is that while it's alive, the
service is still down, and that's not what the admin wants.
Long-running ./finish scripts are almost always a mistake. And that's
why s6-supervise kills ./finish scripts so brutally.

  I think the only satisfactory answer would be to leave it to the user :
keep killing ./finish scripts on a short timer by default, but have
a configuration option to change the timer or remove it entirely. And
with such an option, a "burial notification" when ./finish ends becomes
a possibility.

> Ah, gotcha. I was sending explicit timeout values in my s6-rc comands,
> not using timeout-up and timeout-down files. Assuming -tN is the
> global value, then passing that along definitely makes sense, if
> nothing else than to bring its behavior in-line with the behavior of
> timeout-up and timeout-down.

  Those pesky little s6-svlisten1 processes will get nerfed.

> Part of my job entails dealing with development servers where
> automatic deploys happen pretty frequently but service definitions
> dont change too often. So having non-privileged access to a subsection
> of the supervision tree is more important than having non-privileged
> access to the pre- and post- compiled offline stuff.

  I understand. I guess I can make s6-rc-init and s6-rc 0755 while
keeping them in /sbin, where Joe User isn't supposed to find them.

> By the way, that's less secure than running a full non-privileged
> subtree.

  Oh, absolutely. It's just that a full setuidgid subtree isn't very
common - but for your use case, a full user service database makes
perfect sense.

-- 
  Laurent

Received on Thu Jul 16 2015 - 22:16:00 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:38:49 UTC