Przejdź do treści

🧵 Execve & Fork Internals

🧠 Overview

This module goes deep into how a POSIX‑style shell actually creates and replaces processes:

  • fork() / vfork() / clone() (conceptually)
  • execve() and the exec family
  • file descriptor inheritance and CLOEXEC
  • PATH lookup and execve failures
  • shebang handling (#!)
  • pipelines and process graphs
  • how this all behaves in containers and CI

The goal: when you see cmd1 | cmd2, you should be able to mentally draw the process tree and FD graph.


🎓 Who this is for

  • DevOps/SRE debugging stuck pipelines, zombie leaks, or weird FD behavior.
  • Engineers writing entrypoints or process supervisors in shell.
  • People integrating shell with other runtimes (agents, runners, task executors).
  • Anyone who wants to understand what really happens between bash and the kernel.

You should already be comfortable with:

  • basic shell scripting
  • processes and PIDs
  • exit codes
  • redirections and pipelines

🧩 Role in the ecosystem

Exec/fork internals underpin:

If you don’t understand how processes are created and replaced, you’re guessing when debugging:

  • “Why doesn’t this env var show up?”
  • “Why is this FD still open?”
  • “Why does this pipeline hang?”
  • “Why does this script behave differently in CI vs locally?”

🧩 Internals / Mechanics

🧩 Fork: cloning the shell process

Conceptually:

1
2
3
4
5
6
pid_t pid = fork();
if (pid == 0) {
    // child
} else {
    // parent
}

In the shell:

  • Parent: continues the main loop, tracks jobs, waits.
  • Child: inherits:
  • memory (copy‑on‑write)
  • environment
  • open file descriptors
  • current directory
  • signal dispositions (with some nuances)

The child then typically:

  1. sets up redirections (dup2, close)
  2. adjusts process group / session if needed
  3. calls execve() to replace itself with the target program

If execve() fails, the child usually prints an error and exits with a non‑zero status.


🧩 Execve: replacing the process image

Conceptually:

1
2
3
4
execve("/usr/bin/ls", argv, envp);
// if we get here, execve failed
perror("execve");
_exit(127);

Key properties:

  • Same PID: execve() does not create a new process; it replaces the current one.
  • New code, same process: memory, code, stack, heap are replaced.
  • Environment: passed explicitly as envp (or inherited if using execvp/execlp wrappers).
  • File descriptors: remain open unless marked CLOEXEC.

This is why:

  • a process can exec another binary and keep sockets/pipes open.
  • PID‑based supervision still works across exec boundaries.

🧩 Exec family and PATH lookup

Common exec variants:

  • execve(path, argv, envp) — no PATH lookup, raw syscall.
  • execvp(file, argv) — uses PATH to search for file.
  • execlp(file, arg0, ..., NULL) — same, but varargs.

Shell behavior:

  • When you run ls, the shell:
  • searches PATH for ls
  • builds argv (["ls", ...])
  • builds envp from current environment
  • calls execve("/bin/ls", argv, envp) (via execvp‑like logic)

If PATH lookup fails:

  • command not found
  • exit code is typically 127.

🧩 Shebang (#!) handling

When you run a script file:

1
./script.sh

The kernel:

  1. Reads the first line.
  2. If it starts with #!, e.g.:
1
#!/usr/bin/env bash
  1. It runs:
1
/usr/bin/env bash ./script.sh

(with any extra arguments from the shebang line).

Implications:

  • The interpreter (e.g. bash) is what actually runs the script.
  • Environment and PATH of the parent process affect which interpreter is used.
  • If the shebang is missing or invalid, behavior depends on the OS and invoking shell.

🧩 File descriptors and CLOEXEC

When the shell forks:

  • The child inherits all open FDs from the parent (stdin, stdout, stderr, pipes, sockets, logs, etc.).
  • Before execve(), the child may:
  • dup2() FDs to 0, 1, 2 for redirections.
  • close FDs that should not be visible to the child.

CLOEXEC (FD_CLOEXEC) flag:

  • If set on an FD, the kernel automatically closes it on execve().
  • This prevents leaking internal FDs (e.g. listening sockets, control pipes) into child processes.

Architecturally:

  • Without CLOEXEC: every exec can accidentally inherit internal FDs → hangs, resource leaks, security issues.
  • With CLOEXEC: only explicitly passed FDs survive.

🧩 Pipelines: process and FD graph

For:

1
cmd1 | cmd2 | cmd3

The shell typically:

  1. Creates two pipes: p1 (between cmd1 and cmd2), p2 (between cmd2 and cmd3).
  2. Forks three children.
  3. In each child:
  4. cmd1:
    • dup2(p1_write, STDOUT_FILENO)
    • closes unused FDs
    • execve(cmd1, ...)
  5. cmd2:
    • dup2(p1_read, STDIN_FILENO)
    • dup2(p2_write, STDOUT_FILENO)
    • closes unused FDs
    • execve(cmd2, ...)
  6. cmd3:
    • dup2(p2_read, STDIN_FILENO)
    • closes unused FDs
    • execve(cmd3, ...)

If any process keeps a pipe write end open:

  • readers may never see EOF → pipeline hangs.

This is a classic source of “mysterious” hangs in complex scripts.


🧩 Subshells vs exec

Subshell:

1
( some commands )
  • Implemented via fork() (new process).
  • Runs a copy of the shell with the same environment and state snapshot.
  • Changes to variables, cd, etc. do not affect the parent.

Exec in the current shell:

1
exec some-command
  • No new process is created.
  • The current shell process is replaced by some-command.
  • Useful in:
  • PID 1 entrypoints
  • final step of a script where you don’t need the shell anymore

🔧 Techniques

🔧 Use exec in PID 1 entrypoints

In containers:

1
2
3
4
5
6
#!/bin/sh
# bad: shell stays as PID 1, app is child
run-app "$@"

# better:
exec run-app "$@"

Benefits:

  • The app becomes PID 1.
  • Signals go directly to the app.
  • No extra shell process to manage.

If you need the shell as a supervisor, that’s a different pattern (and you must handle SIGCHLD, wait, etc.).


🔧 Use CLOEXEC for internal FDs

In languages like Python/Go/Rust, set CLOEXEC on:

  • internal control pipes
  • listening sockets
  • log pipes

So that when you exec tools from your process, they don’t inherit those FDs.

In shell, you can’t set CLOEXEC directly, but you should assume that tools you call might leak FDs if they don’t use it.


🔧 Debug PATH and exec failures

When cmd fails with “not found”:

  • Check echo "$PATH".
  • Use type cmd or command -v cmd.
  • Use strace -f -e execve sh script.sh to see what the shell is actually trying to exec.

🔧 Visualize process trees

Use:

1
2
ps f
pstree -p

to see:

  • which process exec’d what
  • which PIDs are still shells
  • where your app actually lives in the tree

⚠️ Pitfalls

⚠️ Shell as a supervisor without understanding exec/fork

Using shell as a long‑running supervisor:

1
2
3
4
while true; do
  run-worker
  sleep 1
done

…without:

  • wait for children
  • proper signal handling
  • understanding FD inheritance

…leads to:

  • zombie accumulation
  • stuck FDs
  • broken shutdown

⚠️ Leaking FDs into children

If a parent process:

  • opens a socket or pipe
  • then execs tools without CLOEXEC

…those tools may:

  • keep FDs open
  • prevent EOF on pipes
  • keep ports bound
  • cause “address already in use” or hangs

⚠️ Misusing exec in the middle of scripts

1
2
3
echo "starting"
exec some-command
echo "this will never run"

After exec, the shell is gone. Anything after it is dead code.


⚠️ PATH‑dependent behavior

Scripts that rely on:

  • PATH containing specific directories
  • env resolving to a specific binary
  • bash being at /bin/bash

…behave differently across:

  • distros
  • containers
  • CI runners

🚨 Real‑world failures

🚨 Failure: CI job hangs due to inherited FD

Scenario:

  • A test runner opens a pipe/socket.
  • It then execs a child process that runs tests.
  • The child inherits the FD and never closes it.
  • The parent waits for EOF on the pipe → never comes → CI job hangs.

Root cause:

  • No CLOEXEC on internal FDs.
  • No explicit FD management before exec.

🚨 Failure: Container doesn’t stop on SIGTERM

Scenario:

1
CMD ["sh", "-c", "run-app.sh"]
  • sh is PID 1.
  • run-app.sh is a child.
  • sh doesn’t forward signals correctly.
  • docker stop sends SIGTERM to PID 1 → shell exits or ignores → app keeps running or dies uncleanly.

Fix:

  • Use exec in the entrypoint:
1
exec run-app.sh
  • Or use a minimal init (tini, dumb-init).

🚨 Failure: “Command not found” only in CI

Scenario:

  • Locally: PATH includes /usr/local/bin, CI: doesn’t.
  • Script calls my-tool assuming it’s globally available.
  • In CI, execvp can’t find it → command not found.

Fix:

  • Validate tools explicitly at the top:
1
2
3
4
command -v my-tool >/dev/null 2>&1 || {
  echo "my-tool is required" >&2
  exit 1
}
  • Or use absolute paths.

🛠️ Patterns

🛠️ Pattern: Final exec in entrypoints

1
2
3
4
5
6
7
8
#!/bin/sh
set -e

# setup, env, migrations, etc.
prepare_app

# replace shell with the app
exec "$@"
  • No extra shell process.
  • Clean signal behavior.
  • Predictable shutdown.

🛠️ Pattern: Explicit process graph thinking

When designing:

1
cmd1 | cmd2 | cmd3

ask:

  • How many processes?
  • Who owns which FDs?
  • Who closes which ends of which pipes?
  • What happens on SIGINT?

This prevents “mysterious” hangs and partial shutdowns.


🛠️ Pattern: Use exec in small wrappers

Instead of:

1
2
#!/bin/sh
my-real-binary "$@"

use:

1
2
#!/bin/sh
exec my-real-binary "$@"

So that:

  • there’s no extra shell layer
  • PID, signals, and exit codes map directly to the real binary

❌ Anti‑patterns

  • using shell as a complex, long‑running supervisor without understanding fork/exec
  • relying on PATH and shebangs without validation
  • ignoring FD inheritance and CLOEXEC
  • sprinkling exec randomly in the middle of scripts
  • assuming “PID 1 is just another process”

🔍 Debugging

🔍 Trace exec/fork with strace

1
strace -f -e trace=process sh script.sh

You’ll see:

  • fork() / clone() calls
  • execve() calls
  • which binaries are actually executed
  • which paths are tried

🔍 Inspect open FDs

Inside a process:

1
ls -l /proc/$$/fd

You’ll see:

  • which FDs are open
  • which pipes/sockets/files are still alive

This is invaluable for debugging hangs and leaks.


🧠 Summary

Execve & fork internals are the mechanical heart of shell execution:

  • fork() clones the shell.
  • execve() replaces the child with the target program.
  • FDs are inherited unless CLOEXEC is used.
  • PATH and shebangs decide what actually runs.
  • Pipelines are just process graphs + FD wiring.

Once you can mentally simulate fork/exec and FD inheritance, you stop guessing and start designing process behavior—especially in containers, CI, and production automation.