🧵 Advanced Shell Process Control
🧠 Overview
Process control is where the shell stops being “a command runner” and becomes a process orchestrator. This includes:
- process groups
- job control
- signal delivery
- foreground/background execution
- subshells
- zombies and reaping
- traps
- PID 1 behavior in containers
This is one of the most misunderstood areas of shell behavior — and one of the most critical for production systems.
🎓 Who this is for
- DevOps/SRE managing long‑running scripts, daemons, or containers.
- Engineers writing orchestration logic or supervising child processes.
- Anyone debugging zombies, hanging pipelines, or broken signal handling.
- People who want deterministic, production‑grade process behavior.
🧩 Internals / Mechanics
🧩 The shell as a process controller
A shell manages:
- processes (PIDs)
- process groups (PGIDs)
- sessions
- terminal control
- signal routing
- job tables
When you run:
1 | |
the shell:
- creates pipes
- forks children
- assigns them to a process group
- optionally puts the group in the background
- tracks them in the job table
🧩 Foreground vs background
- Foreground job owns the terminal → receives
SIGINT,SIGQUIT, etc. - Background job does NOT own the terminal → signals must be sent manually.
🧩 Process groups
A pipeline typically shares a process group:
1 | |
All three commands receive SIGINT when you press Ctrl‑C.
🧩 Subshells and isolation
Subshells:
- have their own PID
- do NOT share variable state
- inherit environment
- inherit file descriptors unless redirected
🔧 Techniques
🔧 Use wait to reap children
1 2 3 | |
Prevents zombies.
🔧 Use trap for clean shutdown
1 | |
🔧 Use process substitution to avoid unnecessary subshells
1 | |
🔧 Use set -m (job control) only in interactive shells
Never enable job control in scripts.
⚠️ Pitfalls
⚠️ Zombie processes from unreaped children
1 2 | |
⚠️ Ctrl‑C not stopping pipelines
If the shell is not managing process groups correctly, only the foreground process receives SIGINT.
⚠️ Traps not firing in subshells
1 | |
The trap runs in the subshell, not the parent.
⚠️ Using kill -9 as a default
SIGKILL prevents cleanup and can corrupt state.
🚨 Real‑World Failures
🚨 Failure: Shell script used as PID 1 leaks zombies
In Docker:
1 | |
sh becomes PID 1 → does NOT reap children → zombies accumulate.
Fix:
- use
tiniordumb-init - or implement a SIGCHLD handler +
wait
🚨 Failure: CI job hangs due to orphaned background process
1 2 | |
The background process keeps running → CI never finishes.
Fix:
1 | |
🚨 Failure: Ctrl‑C doesn’t stop a pipeline
1 | |
If the shell doesn’t set a unified process group, only cmd2 receives SIGINT.
🛠️ Patterns
🛠️ Pattern: Explicit signal handling
1 | |
🛠️ Pattern: Use wait for all children
1 2 3 4 5 6 7 8 9 | |
🛠️ Pattern: Use a minimal init in containers
tini or dumb-init solves:
- zombie reaping
- signal forwarding
- predictable shutdown
❌ Anti‑Patterns
❌ Anti‑pattern: Using shell as a process supervisor
Shell is not systemd. Avoid:
- long‑running loops
- manual restarts
- complex signal routing
❌ Anti‑pattern: Ignoring SIGCHLD
Leads to zombie accumulation.
❌ Anti‑pattern: Running background jobs without cleanup
1 2 | |
🔍 Debugging
🔍 Inspect process tree
1 2 | |
🔍 Inspect process groups
1 | |
🔍 Trace signals
1 | |
⚙️ Performance
⚙️ Avoid excessive forking
Use builtins where possible.
⚙️ Avoid long‑running background loops
They consume CPU and complicate shutdown.
⚙️ Use wait -n (Bash) for efficient worker pools
🐳 Containers
🐳 Shell as PID 1
PID 1 has special semantics:
- ignores some signals by default
- must reap children
- must forward signals
🐳 Use an init wrapper
1 2 | |
🛰️ CI/CD
🛰️ Ensure deterministic shutdown
CI runners kill jobs with SIGTERM → scripts must handle it.
🛰️ Avoid background jobs unless necessary
They often outlive the job and cause hangs.
🧠 Summary
Process control is the backbone of reliable shell scripting. Mastering:
- process groups
- signals
- job control
- subshells
- reaping
- PID 1 behavior
…is essential for writing safe, predictable, production‑grade automation.