Hello, all. I'm using s6 as the init process manager in a Docker 
container, using s6-overlay Everything's working fine, but I send a 
SIGINT to the container, the processes being managed exit, but they 
become zombies and aren't reaped, forcing the system to timeout (twice, 
actually).
I'm using ubuntu:20.04 as a container using s6-overlay amd64 version 
2.2.0.3, which I believe has the latest s6. All runs on an Ubuntu 18.04 
system. It looks like s6-svscan sends SIGINT or SIGTERM to the 
processes, and then uses s6-svwait to wait for the processes to exit, 
but the zombie processes are never reaped.
I found the following reference that suggests the problem might be a 
kernel problem: 
https://github.com/just-containers/s6-overlay/issues/135 
, although I'm not seeing the high zombie CPU usage referenced. I also 
found 
https://wiki.gentoo.org/wiki/S6 , which suggested that sending a 
SIGCHLD to s6-svscan would cause it to re-scan for zombies that didn't work.
Here are the processes once everything is started (viewed by "ps axl" 
after running bash in a separate connection to the container):
> root_at_4fa66da81d02:/# ps axl
> F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME 
> COMMAND
> 4     0     1     0 20   0    196     4 poll_s Ss+  pts/0      0:00 
> s6-svscan -t0 /var/run/s6/services
> 4     0    35     1 20   0    196     4 poll_s S+   pts/0      0:00 
> s6-supervise s6-fdholderd
> 4     0   228     1 20   0    196     4 poll_s S+   pts/0      0:00 
> s6-supervise thttpd
> 4     0   229     1 20   0    196     4 poll_s S+   pts/0      0:00 
> s6-supervise exrouter
> 4 65534   232   228 30  10 179052 165784 poll_s SNs ?          0:00 
> /opt/pdm/bin/thttpd -nip -nos -c **.html|**.sh|
> 4     0   233   229 30  10   6224  1568 poll_s SNs  ?          0:00 
> /opt/pdm/bin/exrouter-cpp
> 4     0   247     0 20   0   5996  3756 do_wai Ss   pts/1      0:00 bash
> 4     0   255   247 20   0   7568  3024 -      R+   pts/1      0:00 ps axl
And, once I issue a ^C to the container, but before any timeout:
> root_at_4fa66da81d02:/# ps axl
> F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME 
> COMMAND
> 4     0     1     0 20   0    176     4 do_wai Ss+  pts/0      0:00 
> foreground  backtick -D  3000  -n  S6_SERVICES
> 4 65534   232     1 30  10      0     0 -      ZNs  ?          0:00 
> [thttpd] <defunct>
> 4     0   233     1 30  10      0     0 -      ZNs  ?          0:00 
> [exrouter-cpp] <defunct>
> 4     0   247     0 20   0   5996  3860 do_wai Ss   pts/1      0:00 bash
> 0     0   271     1 20   0    176     4 do_wai S+   pts/0      0:00 
> foreground  s6-svwait -D  -t  10000  /var/run/
> 4     0   278   271 20   0    204     8 poll_s S+   pts/0      0:00 
> s6-svwait -D -t 10000 /var/run/s6/services/thtt
> 4     0   279   278 20   0    452     4 poll_s S+   pts/0      0:00 
> s6-ftrigrd
> 4     0   280   247 20   0   7568  2976 -      R+   pts/1      0:00 ps axl
And, after the system times out and sends SIGTERM to all the processes:
> root_at_4fa66da81d02:/# ps axl
> F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME 
> COMMAND
> 4     0     1     0 20   0    176     4 do_wai Ss+  pts/0      0:00 
> foreground  backtick -D  3000  -n  S6_KILL_GRA
> 4 65534   232     1 30  10      0     0 -      ZNs  ?          0:00 
> [thttpd] <defunct>
> 4     0   233     1 30  10      0     0 -      ZNs  ?          0:00 
> [exrouter-cpp] <defunct>
> 4     0   279     1 20   0      0     0 -      Z+   pts/0      0:00 
> [s6-ftrigrd] <defunct>
> 0     0   285     1 20   0    168     4 poll_s S+   pts/0      0:00 
> s6-sleep -m -- 10000
> 4     0   292     0 20   0   5992  3760 do_wai Ss   pts/1      0:00 bash
> 4     0   300   292 20   0   7568  3080 -      R+   pts/1      0:00 ps axl
You can see:
- The managed processes are "thttpd" and "exrouter"
- I bumped the timeouts to 10000ms for the above tests
- When s6-svscan decides to exit, it sends signals to all the managed 
processes, and the s6-supervised processes exit but the two managed 
processes become zombies and aren't reaped
- Timing out still doesn't kill thttpd or exrouter (although it does 
kill bash, so I had to reconnect to gather the third "ps axl"
It's easy to cut the timeout to, say, 100ms, but I'd much rather have a 
correct shutdown sequence, as that's why I switched to s6 in the first 
place.
Any ideas?
Thanks,
Dan
-- 
Daniel T. Griscom
152 Cochrane Street, Melrose, MA 02176-1433
(781) 662-9447  griscom_at_suitable.com  http://www.suitable.com/
Received on Wed Jul 21 2021 - 21:19:26 CEST