PATH isn't real on Linux
On a fresh installation of Debian 12 (bookworm), executing echo $PATH
shows the following output:
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
/usr/bin
contains /usr/bin/cat
, and /usr/bin
is in PATH
, so just typing cat
will run /usr/bin/cat
. But what exactly is doing this lookup?
By running strace cat
, you can see the Linux system calls that are used:
c
execve("/usr/bin/cat", ["cat"], 0x7ffdfb2367a0 /* 63 vars */) = 0
The Linux kernel already has the full path (/usr/bin/cat
), so where is /usr/bin
coming from?
Reading the source
On Debian, shell scripts using /bin/sh
use dash, which is responsible for interpreting and executing commands. Digging into main.c
which contains the entry point:
c
/*
* Main routine. We initialize things, parse the arguments, execute
* profiles if we're a login shell, and then call cmdloop to execute
* commands. The setjmp call sets up the location to jump to when an
* exception occurs. When an exception occurs the variable "state"
* is used to figure out how far we had gotten.
*/
// ...
static int
cmdloop(int top)
{
// ...
for (;;) {
// ...
n = parsecmd(inter);
// ...
i = evaltree(n, 0);
}
}
evaltree
in eval.c
is responsible for executing commands:
c
int
evaltree(union node *n, int flags)
{
// ...
case NCMD:
evalfn = evalcommand;
checkexit:
checkexit = EV_TESTED;
goto calleval;
// ...
calleval:
status = evalfn(n, flags);
break;
Then, evalcommand
is used as the final step when the command is just executing a program:
c
STATIC int
// ...
evalcommand(union node *cmd, int flags, struct backcmd *backcmd)
{
// ...
default:
flush_input();
/* Fork off a child process if necessary. */
if (!(flags & EV_EXIT) || have_traps()) {
INTOFF;
jp = vforkexec(cmd, argv, path, cmdentry.u.index);
break;
}
shellexec(argv, path, cmdentry.u.index);
// ...
}
shellexec
in exec.c
calls padvance
:
c
void
shellexec(char **argv, const char *path, int idx)
{
// ...
while (padvance(&path, argv[0]) >= 0) {
cmdname = stackblock();
if (--idx < 0 && pathopt == NULL) {
tryexec(cmdname, argv, envp);
if (errno != ENOENT && errno != ENOTDIR)
e = errno;
}
}
// ...
}
But what is padvance
? Looking further into exec.c
:
c
/*
* Do a path search. The variable path (passed by reference) should be
* set to the start of the path before the first call; padvance will update
* this value as it proceeds. Successive calls to padvance will return
* the possible path expansions in sequence. If an option (indicated by
* a percent sign) appears in the path entry then the global variable
* pathopt will be set to point to it; otherwise pathopt will be set to
* NULL.
*
* If magic is 0 then pathopt recognition will be disabled. If magic is
* 1 we shall recognise %builtin/%func. Otherwise we shall accept any
* pathopt.
*/
const char *pathopt;
int padvance_magic(const char **path, const char *name, int magic)
{
The shell, not the Linux kernel, is responsible for searching for executables in PATH
!
What about other code?
Python's subprocess
can be used like this:
python
subprocess.run(["ls", "-l"])
This calls /usr/bin/ls
since /usr/bin
is in PATH
. But who's doing the path lookup?
CPython contains the following code for subprocess
:
python
# This matches the behavior of os._execvpe().
executable_list = tuple(
os.path.join(os.fsencode(dir), executable)
for dir in os.get_exec_path(env))
which searches PATH
directly in Python before calling out to Linux's execve
.
Go is similar — lp_unix.go
contains it's own implementation to search in PATH
:
go
// LookPath searches for an executable named file in the
// directories named by the PATH environment variable.
// If file contains a slash, it is tried directly and the PATH is not consulted.
// Otherwise, on success, the result is an absolute path.
//
// In older versions of Go, LookPath could return a path relative to the current directory.
// As of Go 1.19, LookPath will instead return that path along with an error satisfying
// [errors.Is](err, [ErrDot]). See the package documentation for more details.
func LookPath(file string) (string, error) {
Rust's Command::spawn
eventually calls libc::execvp
, which searches PATH
:
c
/* Execute FILE, searching in the `PATH' environment variable if it contains
no slashes, with arguments ARGV and environment from `environ'. */
int
execvp (file, argv)
const char *file;
char *const argv[];
{
In fact, Linux doesn't know about PATH
at all! Specifying a program in an executable text file using a shebang requires an absolute path:
sh
#!/bin/sh
works, but
sh
#!sh
doesn't. This is also the reason many programs use this trick:
py
#!/usr/bin/env python
print('Hello world')
since /usr/bin/env
calls execvp
, which will search PATH
.