Re: Unsightly errors due to race on closing stdin when handling signals in s6-log

From: Laurent Bercot <ska-skaware_at_skarnet.org>
Date: Fri, 24 Dec 2021 17:27:54 +0000

>Laurent,
>
>Thanks for the quick turnaround. Let's see if I understand. The key part is here:
>
> case SIGTERM :
> if (flagprotect) break ;
> case SIGHUP :
> handle_stdin = &last_stdin ;
> if (!indata.len) { prepare_to_exit() ; e = 1 ; }
> break ;
>
>The risk is that last_stdin() will try to read stdin after prepare_to_exit() is called.

  Yes. But the only place that can call last_stdin() is the part of the
main loop here:
https://github.com/skarnet/s6/blob/master/src/daemontools-extras/s6-log.c#L1305

and that can only happen when there is actually something to read
(x[xindex0].revents contains IOPAUSE_READ) and, above that, when
fd 0 has been marked as a valid descriptor to poll on (xindex0 is
nonzero).

The problem is that the current iteration of the event loop was started
when we were not exiting yet, so flagexiting is 0, so xindex0 was
(possibly) nonzero:
https://github.com/skarnet/s6/blob/master/src/daemontools-extras/s6-log.c#L1267
xindex0 has been invalidated by our state change; if we continue the
current iteration, we may take the path where (*handle_stdin)() is
called. That's the bug. We need to make sure xindex0 tracks the correct
state at all times, so restarting the loop (with "continue") when the
state has changed ensures xindex0 is valid, since


>This causes the next iteration of the event loop to set xindex0 zero because flagexiting was set by prepare_to_exit():
>
> if (!flagexiting && !(flagblock && r))
> {
> x[j].fd = 0 ;
> x[j].events = IOPAUSE_READ ;
> xindex0 = j++ ;
> }
> else xindex0 = 0 ;
>
>Right?

  Yes. xindex0 will be 0 in all future iterations - only the one where
flagexiting was just set to 1 was risky.

  A generic event loop pattern looks like this:

   for (;;)
   {
     before() ;
     if (exitcondition) break ;
     iopause() ;
     after() ;
   }

  before() reads state into variables and prepares the arguments to
iopause() - you want to only poll for reading on valid fds and when
you're not in lameduck mode (i.e. exiting), and only poll for writing
on fds where you actually have something to write.
  iopause() is the blocking poll()/select() primitive.
  after() is a list of actions to take depending on the state you've
gathered in before() and the events reported by iopause().

  If an action in after() changes the state that has been read in
before(), and this state has been cached in variables local to the
loop, those cache variables have now been invalidated, and further
tests in after() may yield incorrect results, perform actions that
you didn't want to take (such as reading on stdin), or fail to
perform actions that you should have been taking. To avoid that, when
such invalidation happens, the current iteration should be aborted (e.g.
via "continue") so the next invocation of before() can refresh the
state cache. If there are still events that have not been handled by
the current after(), it doesn't matter, because iopause() is level-
triggered and the events are still there so iopause() will return
instantly, and the next after() will take care of the events.


>It did cross my mind that another approach is to trade state space complexity for resource usage by creating an additional file descriptor to /dev/null during the initialisation path.

  Yeah, but no, this isn't a good trade-off. Tracking state in an
event loop is very cheap, and pretty easy unless you make a stupid
mistake (as I did before figuring out the pattern!), so it's really
not worth using an extra fd to dispense with that tracking.

--
  Laurent
Received on Fri Dec 24 2021 - 18:27:54 CET

This archive was generated by hypermail 2.4.0 : Fri Dec 24 2021 - 18:28:22 CET