>I'm using ubuntu:20.04 as a container using s6-overlay amd64 version 2.2.0.3, which I believe has the latest s6. All runs on an Ubuntu 18.04 system. It looks like s6-svscan sends SIGINT or SIGTERM to the processes, and then uses s6-svwait to wait for the processes to exit, but the zombie processes are never reaped.
Hi Daniel,
I'm actually not the maintainer of s6-overlay: John is. I think the
correct place to describe your issue is GitHub where s6-overlay is
hosted.
I am aware that there is a race condition problem with zombies in the
shutdown sequence of s6-overlay. This is not the first time it occurs
(at some point broken kernels were also causing similar troubles, but
this is probably not what is happening here).
For instance, I know that the line at
https://github.com/just-containers/s6-overlay/blob/master/builder/overlay-rootfs/etc/s6/init/init-stage3#L53
is incorrect: s6-svwait cannot run correctly when the supervision tree
has been torn down, which is the case in init-stage3. This is why the
s6-svwait programs are waiting until they time out: even though the
services they're waiting for are down, they're never triggered because
the associated s6-supervise processes, which perform the triggers, are
already dead.
Unfortunately, fixing this requires a significant rewrite of the
s6-overlay shutdown sequence. I have started working on this, but it has
been preempted by another project, and will likely not come out before
2022. I'm sorry; I would like to provide the correct shutdown sequence
you're looking for (and that is entirely possible to achieve with s6)
but as is, we have to make do with the current sequence.
A tweak I would try is replacing the whole foreground block at lines
48-55 with the following: (without a foreground block)
backtick -D 3000 -n S6_SERVICES_GRACETIME { printcontenv
S6_SERVICES_GRACETIME }
importas -u S6_SERVICES_GRACETIME S6_SERVICES_GRACETIME
wait -t ${S6_SERVICES_GRACETIME} { }
This makes it so init-stage3 simply waits for all processes to die
before continuing, instead of waiting for a trigger that will never
come.
It is not a long-term solution though, because having for instance a
shell on your container will make the "wait" command block until it
times out; but it may be helpful for your situation.
Please open a GitHub issue to discuss this.
--
Laurent
Received on Wed Jul 21 2021 - 23:27:35 CEST