Re: runit SIGPWR support from Jonathan de Boyne Pollard on 2020-02-29 (supervision)

From: Jonathan de Boyne Pollard <J.deBoynePollard-newsgroups_at_NTLWorld.COM>
Date: Sat, 29 Feb 2020 13:44:35 +0000

Alex Suykov:

> Just for reference, I checked apcupsd sources.
>

That was not nearly enough research, and your conclusion is ill-founded.

The SIGPWR signal, from before Linux even existed, has always meant
"something has happened with the power". In AT&T Unix System 5 books it
is conventionally described as "power fail/restart". The "something"
can include *that the power has been restored*. And indeed, this has
been its long-time use, on AT&T Unix and on Linux. Miquel van
Smoorenbug's original powerd daemon (which later became Tom Webster's
genpowerd), for just one example, sent a SIGPWR to process 1 and
beforehand wrote *what that meant* into /etc/powerstatus, which could be
"LOW", "FAIL", or "OK". A. B. Prasad's powerh, a trap handler for
snmptrapd, did the same. And there are several others over the years.

* https://linuxgazette.net/issue83/prasad.html

(Indeed, on several operating systems, including HP-UX and SunOS, SIGPWR
sent to a process other than process #1 *only* meant "power has been
restored", and did not have the power fail meaning at all. Programs
were expected to reinitialize after power comes back in their SIGPWR
handlers. A full-screen TUI program would re-draw its UI, for example,
on the assumption that the local terminals had lost power as well and
were currently showing blank screens.)

It is quite wrongheaded to think that this can be completely changed
into a "shut down and power off" command, for that reason and others
besides. Other reasons include that existing systems *already define
different signals to mean that*. There is an existing mechanism
already. Several, in fact. They need more than one signal, too.
systemd (and the nosh toolset's system-manager which has compatible
signalling here) uses two of the real-time signals to mean "shut down
and power off", one to trigger the service changes, and the other to
finalize the procedure. There's no convincing case for either the
systemd authors or me to change, given that SIGPWR has already meant
something else for a long time and changing it would conflict with
existing programs that we want to interoperate with. There's a strongly
convincing case to *avoid* changing SIGPWR, in fact.

The simple truth is that if you are using SIGPWR to mean only a "power
failed" event, you aren't using it correctly. van Smoorenburg init went
and looked into /etc/powerstatus and acted accordingly, and still does
so *to this day*, albeit that the file is now located under /var/run.
(The systemd people and I both provide infrastructure only. We define a
target that is activated, but not what services that target invokes; and
we leave it up to the third-party services to determine what the power
change event actually is.) Send a SIGPWR to process 1 without writing
/run/powerstatus means that you'll get the last power change event set
by the person that *did* use the signal properly.

Moreover, not only is there an existing mechanism in such programs,
there are *several* existing mechanisms. And they don't even all use
signals. The program being run could be anything, not just
system-manager, or runit-init, or systemd, or finit, or even one of the
several specialized programs designed solely to be process #1 of a
container. *Even just those* do not all use signals.

* https://unix.stackexchange.com/a/191875/5132

In the case of van Smoorenburg init and Joachim Nilsson's finit, there
is a private API for commanding system state changes, idiosyncratic to
just those softwares (and not compatible even across the twain),
involving sending messages along a FIFO that process #1 is reading the
other end of. Ironically, Karl M. Hegbloom augmented genpowerd back in
1998 to optionally use this API, which is more expansive than the
comparatively small signal API that van Smoorenburg init has. To
properly use van Smoorenburg init or Nilsson finit in a container, you
have to configure the container manager to *do something else instead of
sending signals at all* to send it system management commands such as
"shut down and halt/restart/poweroff/powercycle", as the API for
commanding these is not signals. The right place to handle all of these
variances is in the container manager's configuration settings.

* https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=23007#25

If you want to have some sort of "shut down and
halt/restart/poweroff/powercycle" API between a container manager and
process #1 inside a container, slightly absurd that the idea is for
something that isn't a fully-fledged virtual machine with a virtual
power supply, it is the program that is run that dictates the protocol
to you, *not* you that dictates to the program being run. The correct
course of action is to tailor the container manager to the particular
program, not try to make all programs, decades after the fact, suddenly
discard and incompatibly overwrite the long-established, event not
command, meaning of SIGPWR.
Received on Sat Feb 29 2020 - 13:44:35 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:19 UTC