Re: Unsightly errors due to race on closing stdin when handling signals in s6-log from Earl Chew on 2021-12-24 (skaware)

From: Earl Chew <earl_chew_at_yahoo.com>
Date: Fri, 24 Dec 2021 08:53:39 -0800

Laurent,

Thanks for the quick turnaround. Let's see if I understand. The key part
is here:

       case SIGTERM :
         if (flagprotect) break ;
       case SIGHUP :
         handle_stdin = &last_stdin ;
         if (!indata.len) { prepare_to_exit() ; e = 1 ; }
         break ;

The risk is that last_stdin() will try to read stdin after
prepare_to_exit() is called.

However, prepare_to_exit() is only called if !indata.len, and causes
handle_signals() to return 1.

The caller of handle_signals() is:

       if (x[0].revents & (IOPAUSE_READ | IOPAUSE_EXCEPT) &&
handle_signals()) continue ;

This causes the next iteration of the event loop to set xindex0 zero
because flagexiting was set by prepare_to_exit():

       if (!flagexiting && !(flagblock && r))
       {
         x[j].fd = 0 ;
         x[j].events = IOPAUSE_READ ;
         xindex0 = j++ ;
       }
       else xindex0 = 0 ;

Right?

It did cross my mind that another approach is to trade state space
complexity for resource usage by creating an additional file descriptor
to /dev/null during the initialisation path. With this additional file
descriptor, either:

a) have prepare_to_exit() either dup2() this to stdin, or

b) use an int inputfd = 0 to earmark the corresponding input source and
overwrite this to instead indicate the /dev/null file descriptor.

Earl

On 2021-12-24 00:46, Laurent Bercot wrote:
>
> So indeed, when an exit signal was received at the same time stdin was
> readable (unless your producer is spamming logs, that's a rare event,
> which is why I never saw it), prepare_to_exit() was called but the
> xindex0 marker was not updated and the remainder of the iteration still
> called *handle_stdin, yielding read errors.
>
> Now, when handle_signals() calls prepare_to_exit(), the event loop
> is restarted, so we're never handling events in an obsolete state.
>
> Please check the latest s6 git head and tell me if it works for you.
>
> The mistake of not restarting the event loop right away on state change
> is something I became aware of and stopped making in later software, but
> I never thought of going back and checking whether s6-log had it.
> Thanks!
>
> --
> Laurent
>
Received on Fri Dec 24 2021 - 17:53:39 CET

This archive was generated by hypermail 2.4.0 : Fri Dec 24 2021 - 17:54:18 CET