Hi Tom,
>def restart_daemon():
> # kill demon gracefully
> system("/opt/s6/bin/s6-svc -T 180000 -wd -d /projectname/service/servicename")
> # kill demon
> system("/opt/s6/bin/s6-svc -k /projectname/service/servicename")
> # restart demon
> system("/opt/s6/bin/s6-svc -T 180000 -wu -u /projectname/service/servicename")
Note that s6 *can* send a graceful kill first and a violent kill later:
look for the timeout-kill file in
https://skarnet.org/software/s6/servicedir.html
However, the kill signal is only sent to the main process, not to the
process group, so this feature is only useful when the main process
itself doesn't exit gracefully on receipt of its down-signal. Which is
not the case here.
>I upgraded s6 to the most recent version 2.13.2.0, which supports the “-K” option (for kill the whole process group) and changed the corresponding line in the function.
>In order to test the change I replaced the demon with a process that does nothing but spawn a child process which does nothing except sleep endlessly.
>When executing the s6-svc commands in sequence the daemon’s child did not,
>however, get killed.
What happened here is that your main process successfully terminated
on your initial SIGTERM. The service was then considered down. When you
tried s6-svc -K, no signal was sent to the process group, because
s6-supervise only sends signals when the service is *up* - otherwise it
considers there's nothing to send the signal to! So s6-svc -K had no
effect.
>Ideally we could send SIGTERM with a timeout and the send SIGKILL to the whole process group in one operation so that the process group is not forgotten. Do you have any suggestions?
As Hoël said, the problem here is that your daemon behaves *almost*
correctly, with a main process that dies gracefully when told to - but
sometimes leaves behind children that don't die. So it's not a
signalling problem, but a *cleanup* problem, and indeed, cleaning up
is the job of a finish script.
So what you want is write a ./finish script for your service -
s6-supervise will run it as soon as your main process dies. What it
needs to do is give children some time to die, then send a SIGKILL to
the process group to clean up. The pgid is given as the 4th argument
to the finish script, as documented on (same page)
https://skarnet.org/software/s6/servicedir.html
So an example finish script could be:
#!/bin/sh
sleep 2
kill -9 -- -"$4"
if you want to give 2 seconds for children to exit gracefully.
(don't forget the - before "$4", this is what tells the kill command
to kill a process group rather than a single process.)
Note that if you want to give them 5 seconds or more, you will need
to adjust the authorized lifetime of a finish script via the
timeout-finish file. (echo 30000 > timeout-finish to allow a finish
script to run for 30 seconds before being killed by s6-supervise.)
And that, normally, should solve your issue.
Mitigation for misbehaving daemons is an area where s6 doesn't shine
for its clarity / ease of use, because these are difficult to do
portably, and are added as afterthoughts - sorry about that.
The process group mitigation was a recent addition, and I'm glad it's
going to see some use. The next version of s6 will also have mitigation
for another common misbehaviour 🙂
--
Laurent
Received on Wed Nov 05 2025 - 22:48:53 CET