Unexpected behavior when supervise/ is a broken symlink

From: Colin Booth <colin_at_heliocat.net>
Date: Mon, 26 Oct 2020 03:28:11 +0000

Not sure if this is a skaware_at_ or supervision@ report, but it directly
impacts s6 so starting here.

In a setup where the supervise subdirectory of a service is a symlink to
some other location (such as with Void where /etc/sv/$svc/supervise is a
symlink to /run/runit/supervise.$svc) and that target does not currently
exist, attempting to run the service via s6 results in a failure.

For example:
carbon:~/tmp$ ls
service supervise.d
carbon:~/tmp$ ls service/
event run supervise
carbon:~/tmp$ file service/supervise
service/supervise: broken symbolic link to ../supervise.d/supervise
carbon:~/tmp$ s6-supervise service/
s6-supervise service/: fatal: unable to mkfifo supervise/control: No
such file or directory

This appears to be because if mkdir("supervise", 0700) fails with EEXIST
s6-supervise then attempts to create the control files. This would be
fine except that mkdir() can fail with EEXIST in the case of broken
symlinks in which case while the symlink exists, the target does not and
the following mknod() calls will not succeed.

This issue is fairly easy to work around: either avoiding symlinked
control directories or getting the list of symlinked control directories
and pre-generating the targets are both easy solutions. The only reason
this is an issue is because it's a break from daemontools and runit
behavior, both of which attempt to fix broken supervise/ symlinks when
encountered.

I have a workaround in place for a mostly drop-in replacement on Void
from runit to s6 but it doesn't support net new services without a
reboot. Obviously the long-term plan is to migrate everything active
into s6-rc, but booting on s6-l-i was step one.

Cheers!
-- 
Colin Booth
Received on Mon Oct 26 2020 - 03:28:11 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:38:49 UTC