execline: why execline and not sh

Why not just use `/bin/sh`?

Security

One of the most frequent sources of security problems in programs is parsing. Parsing is a complex operation, and it is easy to make mistakes while designing and implementing a parser. (See what Dan Bernstein says on the subject, section 5.)

But shells parse all the time. Worse, the essence of the shell is parsing: the parser and the runner are intimately interleaved and cannot be clearly separated, thanks to the specification. The shell performs several kinds of expansions, automatic filename globbing, and automatic word splitting, in an unintuitive order, requiring users to memorize numerous arbitrary quoting rules in order to achieve what they want. Pages abound where common mistakes are listed, more often than not leading to security holes. Did you know that "$@" is a special case of double quoting, because it will split the arguments into several words, whereas every other use of double quotes in a shell is meant to prevent splitting?

execlineb parses the script only once: when reading it. The parser has been designed to be simple and systematic, to reduce the potential for bugs - which you just cannot do with a shell. After execlineb has split up the script into words, no other parsing phase will happen, unless the user explicitly requires it. Positional parameters, when used, are never split, even if they contain spaces or newlines, unless the user explicitly requires it. Users control exactly what is split, what is done, and how.

Portability

The shell language was designed to make scripts portable across various versions of Unix. But it is actually really hard to write a portable shell script. There are dozens of distinct sh flavours, not even counting the openly incompatible csh approach and its various tcsh-like followers. The ash, bash, ksh and zsh shells all exhibit a different behaviour, even when they are run with the so-called compatibility mode. From what I have seen on various experiments, only zsh is able to follow the specification to the letter, at the expense of being big and complex to configure. This is a source of endless problems for shell script writers, who should be able to assume that a script will run everywhere, but cannot in practice. Even a simple utility like test cannot be used safely with the normalized options, because most shells come with a builtin test that does not respect the specification to the letter. And let's not get started about echo, which has its own set of problems. Rich Felker has a page listing tricks to use to write portable shell scripts. Writing a portable script should not be that hard.

execline scripts are portable. There is no complex syntax with opportunity to have an undefined or nonportable behaviour. The execline package is portable across platforms: there is no reason for vendors or distributors to fork their own incompatible version. Scripts will not break from one machine to another; if they do, it's not a "portability problem", it's a bug. You are then encouraged to find the program that is responsible for the different behaviour, and send a bug-report to the program author - including me, if the relevant program is part of the execline distribution.

A long-standing problem with Unix scripts is the shebang line, which requires an absolute path to the interpreter. Scripts are only portable as is if the interpreter can be found at the same absolute path on every system. With /bin/sh, it is almost the case (Solaris manages to get it wrong by having a non-POSIX shell as /bin/sh and requiring something like #!/usr/xpg4/bin/sh to get a POSIX shell to interpret your script). Other scripting languages are not so lucky: perl can be /bin/perl, /usr/bin/perl, /usr/local/bin/perl or something else entirely. For those cases, some people advocate the use of env: #!/usr/bin/env perl. But first, env can only find interpreters that can be found via the user's PATH environment variable, which defeats the purpose of having an absolute path in the shebang line in the first place; and second, this only displaces the problem: the env utility does not have a guaranteed absolute path. /usr/bin/env is the usual convention, but not a strong guarantee: it is valid for systems to have /bin/env instead, for instance.

execline suffers from the same issues. #!/bin/execlineb ? #!/usr/bin/execlineb ? This is the only portability problem that you will find with execline, and it is common to every script language.

The real solution to this portability problem is a convention that guarantees fixed absolute paths for executables, which the FHS does not do. The slashpackage convention is such an initiative, and is well-designed; but as with every convention, it only works if everyone follows it, and unfortunately, slashpackage has not found many followers. Nevertheless, like every skarnet.org package, execline can be configured to follow the slashpackage convention.

Simplicity

I originally wanted a shell that could be used on an embedded system. Even the ash shell seemed big, so I thought of writing my own. Hence I had a look at the sh specification... and ran away screaming. This specification is insane. It goes against every good programming practice; it seems to have been designed only to give headaches to wannabe sh implementors.

POSIX cannot really be blamed for that: it only normalizes existing, historical behaviour. One can argue whether it is a good idea to normalize atrocious behaviour for historical reasons, as is the case with the infamous gets function, but this is the way it is.

The fact remains that modern shells have to be compatible with that historical nonsense and that makes them big and complex at best, or incompatible and ridden with bugs at worst. An OpenBSD developer said to me, when asked about the OpenBSD /bin/sh: "It works, but it's far from not being a nightmare".

Nobody should have nightmare-like software on their system. Unix is simple. Unix was designed to be simple. And if, as Dennis Ritchie said, "it takes a genius to understand the simplicity", that's because incompetent people took advantage of the huge Unix flexibility to write insanely crappy or complex software. System administrators can only do a decent job when they understand how the programs they run are supposed to work. People are slowly starting to grasp this (or are they? We finally managed to get rid of sendmail and BIND, but GNU/Linux users seem happy to welcome the era of D-Bus and systemd. Will we ever learn?) - but even sh, a seemingly simple and basic Unix program, is hard to understand when you lift the cover.

So I decided to forego sh entirely and take a new approach. So far it has been working. The execline specification is simple, and, as I hope to have shown, easy to implement without too many bugs or glitches.

Performance

Since it was made to run on an embedded system, execline was designed to be light in memory usage. And it is.

No overhead due to interactive support.
No overhead due to unneeded features. Since every command performs its task then executes another command, all occupied resources are instantly freed. By contrast, a shell stays in memory during the whole execution time.
Very limited use of the C library. Only the C interface to the kernel's system calls, and some very basic functions like malloc(), are used in the C library. In addition to avoiding the horrible interfaces like stdio and the legacy libc bugs, this approach makes it easy to statically compile execline - you will want to do that on an embedded system, or just to gain performance.

You can have hundreds of execline scripts running simultaneously on an embedded box. Not exactly possible with a shell.

For scripts that do not require many computations that a shell can do without calling external programs, execline is faster than the shell. Unlike sh's one, the execline parser is simple and straightforward; actually, it's more of a lexer than a parser, because the execline language has been designed to be LL(1) - keep it simple, stupid. execline scripts get analysed and launched practically without a delay.

The best use case of execline is a linear, straightforward script, a simple command line that does not require the shell's processing power. In that case, execline will skip the shell's overhead and win big time on resource usage and execution speed.
For longer scripts that fork a few commands, with a bit of control flow, on average, an execline script will run at roughly the same speed as the equivalent shell script, while using less resources.
The worst use case of execline is when the shell is used as a programming language, and the script loops over complex internal constructs that execline is unable to replicate without forking. In that case, execline will waste a lot of time in fork/exec system calls that the shell does not have to perform, and be noticeably slower. execline has been designed as a scripting language, not as a programming language: it is efficient at being the glue that ties together programs doing a job, not at implementing a program's logic.

execline limitations