runit and "sv check" for dependencies

From: James Byrne <james.byrne_at_origamienergy.com>
Date: Wed, 14 Jan 2015 16:24:19 +0000

Hi,

I am working on an embedded Linux system where I want to use the 'runit'
tools to start various system services, and I have an issue where "sv
check" doesn't seem to behave in a useful way.

I have seen it suggested (specifically in the article at
http://rubyists.github.io/2011/05/02/runit-for-ruby-and-everything-else.html)
that "sv check" can be used to implement dependencies in the run file.
The example given in the article is:

/service/lighttpd/run:
   #!/bin/sh -e
   sv -w7 check postgresql
   exec 2>&1 lighttpd -f /etc/lighttpd/lighttpd.conf -D

It goes on to say "This would wait 7 seconds for the postgresql service
to be running, exiting with an error if that timout is reached. runsv
will then run this script again. Lighttpd will never be executed unless
sv check exits without an error (postgresql is up)."

However in practice this will not work, because "sv check" will return
exit code 0 if the "postgresql" service is down, or if it failed to run
at all (i.e. if postgresql/run exited with a non-zero exit code).

Having looked at the code and done various tests (using runit 2.1.2),
"sv check" doesn't appear to be very useful with its current behaviour.
The documentation is ambiguous about what it does, saying that it will:

"Check for the service to be in the state that’s been requested. Wait up
to 7 seconds for the service to reach the requested state, then report
the status or timeout."

This doesn't really make sense, because there isn't any such thing as
the "requested state".

My solution is to make the following change to sv.c:

--- old/sv.c 2014-08-10 19:22:34.000000000 +0100
+++ new/sv.c 2015-01-14 14:29:31.384556297 +0000
_at_@ -227,7 +227,7 @@
        if (!checkscript()) return(0);
        break;
      case 'd': if (pid || svstatus[19] != 0) return(0); break;
- case 'C': if (pid) if (!checkscript()) return(0); break;
+ case 'C': if (!pid || !checkscript()) return(0); break;
      case 't':
      case 'k':
        if (!pid && svstatus[17] == 'd') break;

With this change, "sv check" works in a much more useful way. If all the
services specified are up it will exit with exit code 0, and if not it
will wait until the timeout for them to come up, and return a non-zero
exit code if any are still down.

Is there any reason why I should not make this change? Have I
misunderstood what "sv check" is supposed to do? If this change is OK,
could it be included in future releases of "runit"?

Regards,

James Byrne
Received on Wed Jan 14 2015 - 16:24:19 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:19 UTC