Re: s6-rc transition failures

From: Van Bemten, Lionel (Nokia - BE/Antwerp) <"Van>
Date: Fri, 16 Jun 2017 11:13:57 +0000

Hello Laurent,

Thanks for your answers. Other opinions/experience welcome.

> That is a fair point. Normally, you should adjust the s6-rc
> timeouts (both the global one and the service-specific one) to
> make sure s6-rc does *not* time out before the service is ready -
> but if there's an unexpected significant delay, the situation can
> happen.

Just to be clear I am talking about a service going into an infinite
loop or deadlock. Obviously a bad service but I want to protect my
system against it.

> What I can do is add an option to s6-rc to make it explicitly send
> a s6-svc -d to a service that times out before reaching readiness:
> ensure that a service is either ready in time, or definitely down.
> Would that help?

Yes that would help. I suppose you also mean to wait for the
service to go down before returning ?

> The annoying thing is it can't be symmetrical: when a down
> transition times out, there's no way I'm going to start the service
> again. :) But generally, a down transition timing out signifies a
> badly written finish script, or badly calibrated timeouts, and
> it can be easily solved by running s6-rc -d change again.

I agree. I would add that if timeout-down > timeout-kill + timeout-finish
+ some margin, the down transition should generally never time out.

> What I can do is add a bit of signal handling to s6-rc, so that if
> it gets interrupted, say with a SIGINT or SIGTERM, it exits ASAP,
> while still ensuring consistency of the service states.

I was thinking exactly the same :). I even think this could be tailored
to system shutdown (I do not see another use case). E.g. for ongoing
longrun up transitions, s6-rc could act as if the transition timed out
and send "s6-svc -d". For ongoing longrun down transitions I am not
sure whether it should wait for it to complete or not.

> Unfortunately, for oneshots it would mean waiting for the current
> transitions to finish before exiting - s6-rc has no way to interrupt
> a running oneshot, and adding one (making s6rc-oneshot-runner kill
> all its children) would not help, because until the oneshot script
> exits, it is not visible from the outside whether it has accomplished
> its transition or not - so the state would still be undetermined.

I tend to think this would not be too much of a problem as I picture
oneshots as having timeout-up and timeout-down of a few seconds,
as opposed to longruns having timeouts of one or two minutes.
But this assumption may be totally wrong.

> Also, state consistency cannot be 100% ensured, because s6-rc could
> still receive a SIGKILL - but if you kill -9 s6-rc, you deserve
> trouble.

I won't kill -9 s6-rc, I promise.

Received on Fri Jun 16 2017 - 11:13:57 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:19 UTC