Re: patch: sv check should wait when svrun is not ready

From: Buck Evan <>
Date: Tue, 17 Feb 2015 16:49:44 -0800

On Tue, Feb 17, 2015 at 4:20 PM, Avery Payne <> wrote:
> On 2/17/2015 11:02 AM, Buck Evan wrote:
>> I think there's only three cases here:
>> 1. Users that would have gotten immediate failure, and no amount of
>> spinning would help. These users will see their error delayed by $SVWAIT
>> seconds, but no other difference.
>> 2. Users that would have gotten immediate failure, but could have gotten
>> a success within $SVWAIT seconds. All of these users will of course be glad
>> of the change.
>> 3. Users that would not have gotten immediate failure. None of these
>> users will see the slightest change in behavior.
>> Do you have a particular scenario in mind when you mention "breaking lots
>> of existing installations elsewhere due to a default behavior change"? I
>> don't see that there is any case this change would break.

Thanks for the thoughtful reply Avery. My background is also
"maintaining business software", although putting it in those terms
gives me horrific visions of java servlets and soap protocols.

> I have to look at it from a viewpoint of "what is everything else in the system expecting when this code is called". This means thinking in terms of code-as-API, so that calls elsewhere don't break.

As a matter of API, sv-check does sometimes take up to $SVWAIT seconds to fail.
Any caller to sv-check will be expecting this (strictly limited)
delay, in the exceptional case.
My patch just extends this existing, documented behavior to the
special case of "unable to open supervise/ok".
The API is unchanged, just the amount of time to return the result is changed.

> This happens because the use of "sv check (child)" follows the convention of "check, and either succeed fast or fail fast", ...

Either you're confused about what sv-check does, or I'm confused about
what you're saying.
sv-check generaly doesn't fail fast (except in the special case I'm
trying to make no longer fail fast -- svrun is not started).
Generally it will spin for $SVWAIT seconds before failing.

> Without that fast-fail, the logged hint never occurs; the sysadmin now has to figure out which of three possible services in a dependency chain are causing the hang.

Even if I put the above issue aside aside, you wouldn't get a hang,
you'd get the failure message you're familiar with, just several
seconds (default: 7) later. The sysadmin wouldn't search any more than
previously. He would however find that the system fails less often,
since it has that 7 seconds of tolerance now. This is how sv-check
behaves already when a ./check script exits nonzero.

> While this is
> implemented differently from other installations, there are known cases
> similar to what I am doing, where people have ./run scripts like this:
> #!/bin/sh
> sv check child-service || exit 1
> exec parent-service

This would still work just fine, just strictly more often.
Received on Wed Feb 18 2015 - 00:49:44 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:19 UTC