Re: patch: sv check should wait when svrun is not ready

From: Buck Evan <buck_at_yelp.com>
Date: Tue, 16 Jun 2015 09:37:38 -0700

I'd still like to get this merged.

Avery: are you the current maintainer?
I haven't seen Gerrit Pape on the list.

On Tue, Feb 17, 2015 at 4:49 PM, Buck Evan <buck_at_yelp.com> wrote:

> On Tue, Feb 17, 2015 at 4:20 PM, Avery Payne <avery.p.payne_at_gmail.com>
> wrote:
> >
> > On 2/17/2015 11:02 AM, Buck Evan wrote:
> >>
> >> I think there's only three cases here:
> >>
> >> 1. Users that would have gotten immediate failure, and no amount of
> >> spinning would help. These users will see their error delayed by $SVWAIT
> >> seconds, but no other difference.
> >> 2. Users that would have gotten immediate failure, but could have
> gotten
> >> a success within $SVWAIT seconds. All of these users will of course be
> glad
> >> of the change.
> >> 3. Users that would not have gotten immediate failure. None of these
> >> users will see the slightest change in behavior.
> >>
> >> Do you have a particular scenario in mind when you mention "breaking
> lots
> >> of existing installations elsewhere due to a default behavior change"? I
> >> don't see that there is any case this change would break.
> <snip>
>
> Thanks for the thoughtful reply Avery. My background is also
> "maintaining business software", although putting it in those terms
> gives me horrific visions of java servlets and soap protocols.
>
> > I have to look at it from a viewpoint of "what is everything else in the
> system expecting when this code is called". This means thinking in terms
> of code-as-API, so that calls elsewhere don't break.
>
> As a matter of API, sv-check does sometimes take up to $SVWAIT seconds to
> fail.
> Any caller to sv-check will be expecting this (strictly limited)
> delay, in the exceptional case.
> My patch just extends this existing, documented behavior to the
> special case of "unable to open supervise/ok".
> The API is unchanged, just the amount of time to return the result is
> changed.
>
> > This happens because the use of "sv check (child)" follows the
> convention of "check, and either succeed fast or fail fast", ...
>
> Either you're confused about what sv-check does, or I'm confused about
> what you're saying.
> sv-check generaly doesn't fail fast (except in the special case I'm
> trying to make no longer fail fast -- svrun is not started).
> Generally it will spin for $SVWAIT seconds before failing.
>
> > Without that fast-fail, the logged hint never occurs; the sysadmin now
> has to figure out which of three possible services in a dependency chain
> are causing the hang.
>
> Even if I put the above issue aside aside, you wouldn't get a hang,
> you'd get the failure message you're familiar with, just several
> seconds (default: 7) later. The sysadmin wouldn't search any more than
> previously. He would however find that the system fails less often,
> since it has that 7 seconds of tolerance now. This is how sv-check
> behaves already when a ./check script exits nonzero.
>
>
> > While this is
> > implemented differently from other installations, there are known cases
> > similar to what I am doing, where people have ./run scripts like this:
> >
> > #!/bin/sh
> > sv check child-service || exit 1
> > exec parent-service
>
> This would still work just fine, just strictly more often.
>
Received on Tue Jun 16 2015 - 16:37:38 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:19 UTC