On 17/09/2015 07:43, Colin Booth wrote:
> I thought moves (directory or otherwise) were atomic.
Moves you can perform with rename(), yes, they are.
But when the longrun is kept up during the update, you need to
copy a whole new set of files (the contents of the new service
directory) into the old service directory, which you have to
keep in order not to break its identification by the supervision
tree. So it's not just a rename, it's modifying a whole set of
files in a *live* service directory. And there's no doing that
atomically.
I disable the supervisor for the directory update, so if the
service dies at the wrong time, at least it will not be restarted
until its set of files is consistent again. But there's no such
protection for the ./finish script, so there are serious aerobatics
involved to minimize the window where ./finish is invoked while
its service directory is wildly changing.
I could theoretically add a control command to s6-supervise to
make it delay the execution of ./finish. But I don't think it would
be worth it: it adds significant risks (what if a process sends a
"block" command, then dies or otherwise fails to send an "unblock"
command?), and complexity, for an extreme corner case that will
probably never happen. If a ./finish failure is critical, the user
should simply tell s6-rc-update to restart the service, which is
100% safe because the service directory will then be updated offline
instead of live.
--
Laurent
Received on Thu Sep 17 2015 - 10:40:40 UTC