Re: s6-linux-init: Actions after unmounting filesystems from Laurent Bercot on 2019-08-18 (supervision)

From: Laurent Bercot <ska-supervision_at_skarnet.org>
Date: Sun, 18 Aug 2019 18:35:52 +0000

>Simply excluding filesystems doesn't help when the root filesystem is on one of
>these devices that needs teardown actions after being unmounted. In that case,
>the only workable solution is to have PID1 pivot_root() to a tmpfs with the
>teardown/reboot tools in it. That way you can actually fully unmount the former
>root filesystem.

Are there systems in the real world that actually work like that? That
need pivoting into a "shutdownramfs" in order to be able to unmount the
rootfs and perform teardown operations on it? This is doable, of course,
but sounds too complex and too brittle. You'd need more than a fsck to
recover after a power loss, for instance.

This is a real question, because if s6-l-i has to address that case,
it needs some deeper thought about stage 4 management - it essentially
means that the after-shutdown hook must be exec'ed, not called, because
paths in the stage 4 script may not be valid anymore after a pivot_root.
(There's just a call to s6-l-i-hpr -f, but it kinda is a critical part
of the shutdown procedure and it has to succeed.)

>Under the current architecture, I think it is ideal to maintain the
>mount/unmount symmetry as much as possible. This is done in your service manager
>via dependencies. Have a oneshot for each filesystem that needs to be mounted,
>and (when applicable) have that oneshot depend on another oneshot that
>creates/destroys the underlying lvm/md/dm-crypt device. Then the dm device will
>be created before mount, and destroyed after unmount.
>
>Yes, unmounting will fail if the filesystem is in use (or already unmounted),
>and destroying will fail if the filesystem is still mounted, but that's okay
>because a `down` script should be idempotent and /always/ return success. Even
>when there are failures, this will minimize the number of loose ends left when
>shutdownd takes over. The unmounting done by shutdownd should be treated as a
>last resort.

I agree that keeping as much symmetry as possible is cleaner, but
unmounts failing because the fs is still in use is not an exception,
it's _the common case_ if no nuke is performed before the unmount. And
if a "destroy" action has to be performed after an unmount, and cannot
be done if the unmount fails, then it still needs to be performed after
the stage 4 unmount - so it still needs to be registered in the hook,
there's no way around it.

>As a side note: what if there was a oneshot that did `kill -1` when brought
>down, and this oneshot depended on all of your filesystem mounts? Other than the
>obvious problem of s6-rc-update nuking your system, would it be possible to make
>s6-rc recover from being nuked and continue a shutdown?

  s6-rc-update would only nuke your system if there was a change below
the services that mount your filesystems :)

  That said, I don't like the idea of sending a nuke while the service
manager is still active. There would be a way to make s6-rc recover:
have a special service that checks a file when it starts, and if the
file is there, it's a recovery mark and an instruction to run the rest
of the procedure. (It's exactly what s6-l-i-shutdownd is doing to run
stage 4 after it gets killed by the stage 3 nuke.) But it would mean
an additional ad-hoc service, and as cool as it is to see a
supervision tree automatically restart processes, I'm not a fan of
triggering the Apocalypse while we're still in the process of tidying
up the world in an orderly fashion. Let's tidy up what we can and
exit the stage before killing everyone. That's just being civilized.

  I'm afraid the best utility/complexity ratio is just to call a hook
in stage 4, and declare it the responsibility of the hook writer to
ensure the hook doesn't hang - it has to either exit in a reasonable
time or perform a hard reboot itself (e.g. in a pivot_root situation).

  Now the fun part for me is to find a way for s6-l-i-umountall to
leave the proper filesystems mounted. It's not as straightforward as
it seems: if /dev is a symlink to, say, /mnt/tmpfs/dev, then you want
to keep /mnt/tmpfs/dev, even if it means you have to keep /mnt/tmpfs.
But if you have a second dev mount on /usr/local/mychroot/dev, then
you want to unmount that one, so you can unmount /usr/local. I suppose
I can count the number of devtmpfs, procfs and sysfs, and only keep
one of each (the first that was mounted), but I have to think more
about it to make sure it catches everything.

--
  Laurent

Received on Sun Aug 18 2019 - 18:35:52 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:19 UTC