The problem with nsswitch

nsswitch, or Name Service Switch, is a common Unix mechanism to describe how user/group/shadow databases should be accessed. Nowadays it's prevalent on Linux because it's the mechanism used by the glibc.

Unfortunately, nsswitch has a certain number of flaws that make it difficult to use in a small and secure environment. In other words, it's crap. Here's why.

nsswitch uses dynamically linked modules.

nsswitch works by reading a configuration file, /etc/nsswitch.conf, and depending on what it reads in this file, loading one or more shared libraries, via dlopen(), into the application. These shared libraries, for instance /lib/libnss_files-2.19.so, are provided by the NSS implementation (glibc on Linux). This mechanism has drawbacks.

It makes it difficult to link programs statically.

Programs using dlopen() are notoriously difficult to use in a static linking environment: by nature, dlopen() is dynamic, and it's practically impossible to make it work reliably and correctly in statically linked programs.

So, small programs that just need a getpwnam() call cannot, for all intents and purposes, be linked statically when the implementation of getpwnam() goes through nsswitch.

By contrast, the nsss implementation of getpwnam() works with static linking without trouble, and without pulling the whole libc - only the nsss client library is pulled, and it is quite small.

It dynamically adds third-party code to the process' address space.

This is a common security issue with dynamically loaded modules.

Normally, when you link your executable against a third-party library - in this case, the libc - the library has a public API that you're using, and that API has documented behaviour. Some sanity checks are performed at link time, and if something is terribly wrong, linking fails.

This is not the case with dynamically loaded modules used internally by a library. These modules do not have a contract with you, the application developer, but only with the library that uses them. Some checks are performed at library build time, but not at application build time. When dlopen() is run, it performs some minimal checks at run-time (which is the worst time for checks, because failure causes application downtime!), then loads code and data into your application's address space without ever having verified that the interaction is okay.

It would be extremely easy for a malicious third-party to inject subtly bad code making your application behave in unintended ways using dynamically loaded modules. And even from benevolent library authors, it makes bugs more subtle and harder to catch.

By contrast, nsss doesn't load its backends into the client's address space - only the fallback nsss-unix implementation using /etc/passwd is linked client-side, and there's even an option to disable that. All the complex backend code lives server-side in the appropriate nsssd daemon, sharing no address space with the application.

nsswitch adds a configuration parser and a decision automaton to the application.

nsswitch's configuration is done via the /etc/nsswitch.conf file, a text, human-friendly file. The first time a user database function is called, the file is read and parsed, and then for all subsequent user database function calls, a decision automaton (that results from this parsing) is run so the engine knows which sequence of backends to call in which situation.

All this, obviously, happens at run-time, in the application's address space. Maybe it's time for a quick reminder that

parsing is bad - most people can't write parsers, and bugs love them (both the parsers and these people)
run-time is the worst time for syntax errors, and any other errors that could and should be caught earlier
library code should be kept as simple as possible and a dynamic decision automaton doesn't qualify as "simple"
every line of code linked into a critical application (such as login) is attack surface

The nsswitch configuration model goes against all these basic programming principles.

By contrast, nsss:

performs no parsing at all - and if a generic backend ever needs parsing, it will be done in its own process address space, not in the application's.
has the simplest possible decision engine: "if contacting the backend fails, fall back on the Unix mechanism". And even that can be overridden at application build time. If a more complex decision engine is needed, it can be implemented, say it with me, in a backend that has its own address space.
frontloads as many decisions as possible before application run time. The backend used by applications is determined when the nsssd service starts, and can be changed by modifying and restarting this service; the burden of determining which backend to run is not carried by applications.