Hello, all. I'm using s6 as the init process manager in a Docker
container, using s6-overlay Everything's working fine, but I send a
SIGINT to the container, the processes being managed exit, but they
become zombies and aren't reaped, forcing the system to timeout (twice,
actually).
I'm using ubuntu:20.04 as a container using s6-overlay amd64 version
2.2.0.3, which I believe has the latest s6. All runs on an Ubuntu 18.04
system. It looks like s6-svscan sends SIGINT or SIGTERM to the
processes, and then uses s6-svwait to wait for the processes to exit,
but the zombie processes are never reaped.
I found the following reference that suggests the problem might be a
kernel problem:
https://github.com/just-containers/s6-overlay/issues/135
, although I'm not seeing the high zombie CPU usage referenced. I also
found
https://wiki.gentoo.org/wiki/S6 , which suggested that sending a
SIGCHLD to s6-svscan would cause it to re-scan for zombies that didn't work.
Here are the processes once everything is started (viewed by "ps axl"
after running bash in a separate connection to the container):
> root_at_4fa66da81d02:/# ps axl
> F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME
> COMMAND
> 4 0 1 0 20 0 196 4 poll_s Ss+ pts/0 0:00
> s6-svscan -t0 /var/run/s6/services
> 4 0 35 1 20 0 196 4 poll_s S+ pts/0 0:00
> s6-supervise s6-fdholderd
> 4 0 228 1 20 0 196 4 poll_s S+ pts/0 0:00
> s6-supervise thttpd
> 4 0 229 1 20 0 196 4 poll_s S+ pts/0 0:00
> s6-supervise exrouter
> 4 65534 232 228 30 10 179052 165784 poll_s SNs ? 0:00
> /opt/pdm/bin/thttpd -nip -nos -c **.html|**.sh|
> 4 0 233 229 30 10 6224 1568 poll_s SNs ? 0:00
> /opt/pdm/bin/exrouter-cpp
> 4 0 247 0 20 0 5996 3756 do_wai Ss pts/1 0:00 bash
> 4 0 255 247 20 0 7568 3024 - R+ pts/1 0:00 ps axl
And, once I issue a ^C to the container, but before any timeout:
> root_at_4fa66da81d02:/# ps axl
> F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME
> COMMAND
> 4 0 1 0 20 0 176 4 do_wai Ss+ pts/0 0:00
> foreground backtick -D 3000 -n S6_SERVICES
> 4 65534 232 1 30 10 0 0 - ZNs ? 0:00
> [thttpd] <defunct>
> 4 0 233 1 30 10 0 0 - ZNs ? 0:00
> [exrouter-cpp] <defunct>
> 4 0 247 0 20 0 5996 3860 do_wai Ss pts/1 0:00 bash
> 0 0 271 1 20 0 176 4 do_wai S+ pts/0 0:00
> foreground s6-svwait -D -t 10000 /var/run/
> 4 0 278 271 20 0 204 8 poll_s S+ pts/0 0:00
> s6-svwait -D -t 10000 /var/run/s6/services/thtt
> 4 0 279 278 20 0 452 4 poll_s S+ pts/0 0:00
> s6-ftrigrd
> 4 0 280 247 20 0 7568 2976 - R+ pts/1 0:00 ps axl
And, after the system times out and sends SIGTERM to all the processes:
> root_at_4fa66da81d02:/# ps axl
> F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME
> COMMAND
> 4 0 1 0 20 0 176 4 do_wai Ss+ pts/0 0:00
> foreground backtick -D 3000 -n S6_KILL_GRA
> 4 65534 232 1 30 10 0 0 - ZNs ? 0:00
> [thttpd] <defunct>
> 4 0 233 1 30 10 0 0 - ZNs ? 0:00
> [exrouter-cpp] <defunct>
> 4 0 279 1 20 0 0 0 - Z+ pts/0 0:00
> [s6-ftrigrd] <defunct>
> 0 0 285 1 20 0 168 4 poll_s S+ pts/0 0:00
> s6-sleep -m -- 10000
> 4 0 292 0 20 0 5992 3760 do_wai Ss pts/1 0:00 bash
> 4 0 300 292 20 0 7568 3080 - R+ pts/1 0:00 ps axl
You can see:
- The managed processes are "thttpd" and "exrouter"
- I bumped the timeouts to 10000ms for the above tests
- When s6-svscan decides to exit, it sends signals to all the managed
processes, and the s6-supervised processes exit but the two managed
processes become zombies and aren't reaped
- Timing out still doesn't kill thttpd or exrouter (although it does
kill bash, so I had to reconnect to gather the third "ps axl"
It's easy to cut the timeout to, say, 100ms, but I'd much rather have a
correct shutdown sequence, as that's why I switched to s6 in the first
place.
Any ideas?
Thanks,
Dan
--
Daniel T. Griscom
152 Cochrane Street, Melrose, MA 02176-1433
(781) 662-9447 griscom_at_suitable.com http://www.suitable.com/
Received on Wed Jul 21 2021 - 21:19:26 CEST