Re: s6: something like runit's ./check script from Laurent Bercot on 2015-09-03 (supervision)

From: Laurent Bercot <ska-supervision_at_skarnet.org>
Date: Thu, 3 Sep 2015 14:54:43 +0200

On 03/09/2015 12:44, Crest wrote:
> Starting an unsupervised background process is a brittle workaround.

  It's not a brittle workaround, it's the exact right way to do it.
You want to poll for service readiness until a certain timeout? well,
then start a process that polls for service readiness until a certain
timeout. The process can be 2 lines of shell. It doesn't get any
simpler than this; I really don't see what the big deal is.

  Is it the fact that it's not integrated into s6-supervise that is
bothering you? Well, I'm not integrating low-quality features into
s6-supervise if they can be trivially implemented as a script. The
script I suggested is not ad hoc: it's a straightforward, clean
implementation of the feature. It can be refined if you want to be
able to configure the number of tries and the polling period, which
implies something like reading two environment variables; it's not
exactly rocket science.

  Is it the fact that you want supervised, continuous polling? In
that case, you're outside of the "readiness notification" territory
and clearly in "service-specific monitoring" territory; the s6
notification mechanism is not, and cannot be, powerful enough to
express all you can do with service-specific monitoring. The right
way to proceed, as you wrote yourself, is to have a separate supervised
service that performs the monitoring; but don't try and plug back the
result of the monitoring into the s6 notification mechanism! Instead,
just use that result directly to send the proper alerts, take the proper
action, etc. This is entirely service-specific and s6 cannot help you
there except in supervising your monitor.

  If you want your monitor to use s6-style notifications, you can:
set up your own fifodir, have your alert managers subscribe to it,
and have your monitor send s6-ftrig-notify messages to it: it will
work, and you can define your own service-specific protocol, most
likely richer and more complex than just "up, ready, down, ready",
between the monitor and its subscribers. Just don't hijack
s6-supervise's notification channel for that.

  The corner cases and things that can go wrong will only happen when
you try to use a mechanism for a purpose it is not suited for. So
don't do that.

> The ./check script could fail after each test or include a polling loop,
> but it could also wait for more complex conditions e.g. wait for cluster
> to elect a leader or just implement the glue logic between s6's
> readiness notification protocol and the protocol(s) supported by the
> service.

  AIUI, Buck's request was to be able to use ./check scripts from runit
service directories as is with s6. Your suggestions here are changing the
semantics: ./check is meant as a one-time polling action. If you need
something more complex, script around ./check, or use something else
entirely.

  TL;DR:

  - polling is bad and easy, anybody can do bad and easy, you don't need s6
to hold your hand for that
  - runit ./check support can be achieved in s6 via a trivial ./run wrapper
and there's nothing hackish about that
  - long-term polling can be achieved via a separate service and there's
nothing hackish about that
  - if you want to specifically monitor a daemon with commands that poll its
state, don't involve the supervisor's notification mechanism, because it
can't be tailored to your exact needs. You can still use the s6-ftrig-*
mechanism if you want to publish your monitor's results.

-- 
  Laurent

Received on Thu Sep 03 2015 - 12:54:43 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:19 UTC