Re: fd_close() conceals errors from Laurent Bercot on 2020-04-27 (skaware)

From: Laurent Bercot <ska-skaware_at_skarnet.org>
Date: Mon, 27 Apr 2020 11:30:27 +0000

>Hi Laurent. It's been a while.
>:)

  Long time no see. :) We'd love to have you on #s6, if you can!

>I'm updating my packages to catch up with skalibs, gcc warnings, etc.,
>and fd_close() returning void is one of the changes that hit me.
>Looking over past discussions, I agree that POSIX is a mess
>here--nevertheless, given that close() is the way it is, I think it's
>best for fd_close() to report errors other than EINTR to the caller.
>What do you think?

  Ha. This has been the subject of several discussions, and a long
thought
process for me. This change has probably taken the most thinking time by
character in the diff.

  The long and short of it is that close() is a destructor (it frees up
resources that have previously been allocated), and with the years I've
found the following rule to be really important:

  *Destructors should never fail.*

  Breaking that rule is an incitation to terrible programming, and can
lead to absurd behaviour, a striking example of which is service
management systems (read: systemd) that refuse to shut a machine down,
and keep the machine looping in a dysfunctional state, when something
goes wrong during the shutdown procedure.

  What did it for me was to ask myself the question: what do you do if
a destructor fails?

  There is no good answer to that question. Do you consider the resource
still allocated? You need to free it to keep going, but attempting to
free it is obviously failing. Do you keep going and leak the resource?
If not, how do you report failure? Say you manage to report the failure
with a "resource cannot be freed" error code. What should the caller do?
Same exact question. So you defer to the caller again, and the problem
goes all the way back to main(), and then what should the program do?
exit just because it could not free a resource? What if this program is
your shutdown procedure?

  The only sane way to proceed is to keep trucking as if the destructor
did not fail. You might leak a resource, but there's no way to free it
and attempts to report it will just make you fail harder than if you
ignore it. If you can't free a resource, it's not something you can
handle, it's a system problem, and it's above your paygrade as a simple
program that called a destructor.

  And so, if the sanest thing for a caller to do is to ignore errors, it
means that generating errors is a no-no for a destructor. They should
not
be allowed to fail, period.

  So, yes, POSIX is terrible, and close() should return void in the first
place. Note that at least on Linux, the fd is closed even when close()
returns -1 EINTR! so in practice, at least on Linux it indeed never
fails.
(But even in that case the API is a problem, because you cannot assume
that every OS will do that; if the fd is closed then you should
return success; failure should mean that the fd still needs to be
closed; attempting to close it again is only valid in single-threaded
programs, because in multithreaded programs you could end up closing an
unrelated fd that was opened by another thread in a race. Yes, that's
the kind of mess you get yourself into when destructors fail.)

  fd_close() catches EINTR just for safety and because skalibs is mostly
used in single-threaded programs. EBADF is a programming error and
should
never happen. EIO and other potential error codes signal an underlying
problem, it's best to ignore them here, they will be raised again next
time there's a filesystem call.

  fd_close() does the job of a safe wrapper, which is to hide
infelicities
of POSIX behaviour. And that includes providing a sane API, i.e.
returning
void. :)

--
  Laurent

Received on Mon Apr 27 2020 - 11:30:27 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:38:49 UTC