aboutsummaryrefslogtreecommitdiffstats
execline: exit codes

execline
Software
skarnet.org

How to propagate exit codes up a process dynasty

Say we have a parent process P, child of a grandparent process G, spawning a child process C and waiting for it. Either C dies normally with an exit code from 0 to 255, or it is killed by a signal. How can we make sure that P reports to G what happened to C, with as much precision as possible?

The problem is, there's more information in a wstat (the structure filled in by waitpid()) than a process can report by simply exiting. P could exit with the same exit code as C, but then what should it do if C has been killed by a signal?

An idea is to have P kill itself with the same signal that killed C. But that's actually not right, because P itself could be killed by a signal from another source, and G needs that information. "P has been killed by a signal" and "C has been killed by a signal" are two different pieces of information, so they should not be reported in the same way.

So, any way you look at it, there is always more information than we can report.

Shells have their own convention for reporting crashes, but since any exit code greater than 127 is reported as is, the information given by the shell is unreliable: "child exited 129" and "child was killed by SIGHUP" are indistinguishable. When shells get nested, all bets are off - the information conveyed by exit codes becomes devoid of meaning pretty fast. We need something better.

execline's solution

execline commands such as if, that can report a child's exit code, proceed that way when they're in the position of P:

  • If C was killed by a signal: P exits 128 plus the signal number.
  • If C exited 128 or more: P exits 128.
  • Else, P exits with the same code as C.

Rationale:

  • 128+ exit codes are extremely rare and should report really problematic conditions; commands usually exit 127 or less. If C exits 128+, it's more important to convey the information "something really bad happened, but the C process itself was not killed by a signal" than the exact nature of the event.
  • Commands following that convention can be nested. If P exits 129+, G knows that C was killed by a signal. If G also needs to report that to its parent, it will exit 128: G's parent will not know the signal number, but it will know that P reported 128 or more, so either C or a scion of C had problems.
  • Exact information is reported in the common case.

Summary of common exit codes for execline programs

  • 0: success. This code is rarely encountered, because most execline programs chainload into something else when they succeed, instead of exiting 0.
  • 100: wrong usage
  • 111: system call failed
  • 126: unable to chainload into another program (any other error than ENOENT)
  • 127: unable to chainload into another program (executable not found)