Skip to content

PATH isn't real on Linux

On a fresh installation of Debian 12 (bookworm), executing echo $PATH shows the following output:

/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

/usr/bin contains /usr/bin/cat, and /usr/bin is in PATH, so just typing cat will run /usr/bin/cat. But what exactly is doing this lookup?

By running strace cat, you can see the Linux system calls that are used:

c
execve("/usr/bin/cat", ["cat"], 0x7ffdfb2367a0 /* 63 vars */) = 0

The Linux kernel already has the full path (/usr/bin/cat), so where is /usr/bin coming from?

Reading the source

On Debian, shell scripts using /bin/sh use dash, which is responsible for interpreting and executing commands. Digging into main.c which contains the entry point:

c
/*
 * Main routine.  We initialize things, parse the arguments, execute
 * profiles if we're a login shell, and then call cmdloop to execute
 * commands.  The setjmp call sets up the location to jump to when an
 * exception occurs.  When an exception occurs the variable "state"
 * is used to figure out how far we had gotten.
 */

// ...

static int
cmdloop(int top)
{
    // ...
    for (;;) {
        // ...
        n = parsecmd(inter);
        // ...
        i = evaltree(n, 0);
    }
}

evaltree in eval.c is responsible for executing commands:

c
int
evaltree(union node *n, int flags)
{
    // ...
    case NCMD:
    		evalfn = evalcommand;
checkexit:
    		checkexit = EV_TESTED;
    		goto calleval;
    // ...
calleval:
    		status = evalfn(n, flags);
    		break;

Then, evalcommand is used as the final step when the command is just executing a program:

c
STATIC int
// ...
evalcommand(union node *cmd, int flags, struct backcmd *backcmd)
{
    // ...
    default:
    		flush_input();

    		/* Fork off a child process if necessary. */
    		if (!(flags & EV_EXIT) || have_traps()) {
    			INTOFF;
    			jp = vforkexec(cmd, argv, path, cmdentry.u.index);
    			break;
    		}
    		shellexec(argv, path, cmdentry.u.index);
    // ...
}

shellexec in exec.c calls padvance:

c
void
shellexec(char **argv, const char *path, int idx)
{
    // ...
    while (padvance(&path, argv[0]) >= 0) {
			cmdname = stackblock();
			if (--idx < 0 && pathopt == NULL) {
				tryexec(cmdname, argv, envp);
				if (errno != ENOENT && errno != ENOTDIR)
					e = errno;
			}
		}
    // ...
}

But what is padvance? Looking further into exec.c:

c
/*
 * Do a path search.  The variable path (passed by reference) should be
 * set to the start of the path before the first call; padvance will update
 * this value as it proceeds.  Successive calls to padvance will return
 * the possible path expansions in sequence.  If an option (indicated by
 * a percent sign) appears in the path entry then the global variable
 * pathopt will be set to point to it; otherwise pathopt will be set to
 * NULL.
 *
 * If magic is 0 then pathopt recognition will be disabled.  If magic is
 * 1 we shall recognise %builtin/%func.  Otherwise we shall accept any
 * pathopt.
 */

const char *pathopt;

int padvance_magic(const char **path, const char *name, int magic)
{

The shell, not the Linux kernel, is responsible for searching for executables in PATH!

What about other code?

Python's subprocess can be used like this:

python
subprocess.run(["ls", "-l"])

This calls /usr/bin/ls since /usr/bin is in PATH. But who's doing the path lookup?

CPython contains the following code for subprocess:

python
# This matches the behavior of os._execvpe().
executable_list = tuple(
    os.path.join(os.fsencode(dir), executable)
    for dir in os.get_exec_path(env))

which searches PATH directly in Python before calling out to Linux's execve.

Go is similar — lp_unix.go contains it's own implementation to search in PATH:

go
// LookPath searches for an executable named file in the
// directories named by the PATH environment variable.
// If file contains a slash, it is tried directly and the PATH is not consulted.
// Otherwise, on success, the result is an absolute path.
//
// In older versions of Go, LookPath could return a path relative to the current directory.
// As of Go 1.19, LookPath will instead return that path along with an error satisfying
// [errors.Is](err, [ErrDot]). See the package documentation for more details.
func LookPath(file string) (string, error) {

Rust's Command::spawn eventually calls libc::execvp, which searches PATH:

c
/* Execute FILE, searching in the `PATH' environment variable if it contains
   no slashes, with arguments ARGV and environment from `environ'.  */
int
execvp (file, argv)
     const char *file;
     char *const argv[];
{

In fact, Linux doesn't know about PATH at all! Specifying a program in an executable text file using a shebang requires an absolute path:

sh
#!/bin/sh

works, but

sh
#!sh

doesn't. This is also the reason many programs use this trick:

py
#!/usr/bin/env python
print('Hello world')

since /usr/bin/env calls execvp, which will search PATH.