Re: runit SIGPWR support from Laurent Bercot on 2020-02-18 (supervision)

From: Laurent Bercot <ska-supervision_at_skarnet.org>
Date: Tue, 18 Feb 2020 09:39:14 +0000

>absolutely right, totally agreed.
>i also wondered why he refuses to add this.
>just catch and handle ALL possible signals, including the RT signals
>and leave it to the user how to react.

In the github issue you linked, I explained my exact reasoning.
An additional reason is that signaling init is not a casual operation;
instead it's part of a very limited API between the kernel and user
space, to be used in very controlled, exhaustively listed, situations.

>sorry Laurent, this is absolutely ridicolous.
>we are talking about using s6 as Linux process #1

No, that's not what we were talking about. We were talking about using
runit as pid 1 in a container. I just used s6 and SIGPWR-as-sent-by-lxd
as an illustration of why patching software is always more complicated
than using configuration switches. And I stand by my point.

Now, *as a separate conversation*, you can say that s6-svscan should
be able to handle every signal that the kernel can throw at it, no
matter how unportable. And it is a reasonable request: there are good
arguments for it. But the case for SIGPWR *is not* "that is the signal
sent by lxd when it wants to shut down a container"! The case for SIGPWR
is "the kernel may send this in the event of a power failure".

You may find that the difference is asinine, and that I'm splitting
hairs; but I'm really not, and the difference is subtle but important.

In the latter case, the kernel takes precedence over init, the kernel
decides what the API is and init must adapt. If the kernel says "when
I get a power failure, I send you SIGPWR", init cannot say "uh, no,
I wish you'd send SIGUSR2 instead". Shut up and handle SIGPWR.

In the former case, lxd *emulates* a kernel, and is supposed to adapt
to every kind of init that runs in a container, so it should follow
existing conventions and be able to adapt to every init. And that's
exactly why the lxc.signal.stop configuration switch exists!

Now, "stop the machine" is not a signal that a kernel would send on
its own. The decision to power off the machine comes from the admin,
usually via a "shutdown" command or equivalent. And here's the thing:
there is *no universal convention* on the API that a "shutdown"
command must follow. None.
Some inits use SIGTERM for that. Others use SIGUSR1. Others use SIGUSR2.
Others use a totally different mechanism and don't send a signal to
init at all. systemd, always being a special snowflake, uses SIGRTMIN+3
and SIGRTMIN+4, because any other choice made way too much sense.

None of them uses SIGPWR, and for a good reason: SIGPWR does not mean
"the admin requested a system shutdown", it means "power failure". And
it is very possible that the action implemented by the system in case
of a power failure is very different from a shutdown: it could be a
suspend-to-disk, for instance (which is faster than a full shutdown, and
when the power fails you want to save your data *fast*). So, even for
inits that actually understand SIGPWR - and most of them actually do -
SIGPWR is a *terrible* default choice of signal to send as a shutdown
request. It already has a use, and the use is not a normal shutdown.
Arguably, lxc.signal.halt should *always* be set to something else, be
it SIGTERM, SIGUSR1, SIGUSR2, or even lolSIGRTMIN+3.

So, if you're asking me to implement SIGPWR support in s6 because that's
what lxd sends by default to signal a container shutdown, I will laugh
at you, because you are being, uh, "ridicolous". On the other hand, if
you're telling me that s6-svscan needs to understand SIGPWR in case the
kernel wants to signal a power failure, you actually have a good point,
and yes, I should implement SIGPWR support when this signal exists.

--
Laurent

Received on Tue Feb 18 2020 - 09:39:14 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:19 UTC