Przejdź do treści

🧵 Advanced Shell Process Control

🧠 Overview

Process control is where the shell stops being “a command runner” and becomes a process orchestrator. This includes:

  • process groups
  • job control
  • signal delivery
  • foreground/background execution
  • subshells
  • zombies and reaping
  • traps
  • PID 1 behavior in containers

This is one of the most misunderstood areas of shell behavior — and one of the most critical for production systems.


🎓 Who this is for

  • DevOps/SRE managing long‑running scripts, daemons, or containers.
  • Engineers writing orchestration logic or supervising child processes.
  • Anyone debugging zombies, hanging pipelines, or broken signal handling.
  • People who want deterministic, production‑grade process behavior.

🧩 Internals / Mechanics

🧩 The shell as a process controller

A shell manages:

  • processes (PIDs)
  • process groups (PGIDs)
  • sessions
  • terminal control
  • signal routing
  • job tables

When you run:

1
cmd1 | cmd2 &

the shell:

  1. creates pipes
  2. forks children
  3. assigns them to a process group
  4. optionally puts the group in the background
  5. tracks them in the job table

🧩 Foreground vs background

  • Foreground job owns the terminal → receives SIGINT, SIGQUIT, etc.
  • Background job does NOT own the terminal → signals must be sent manually.

🧩 Process groups

A pipeline typically shares a process group:

1
cmd1 | cmd2 | cmd3

All three commands receive SIGINT when you press Ctrl‑C.

🧩 Subshells and isolation

Subshells:

  • have their own PID
  • do NOT share variable state
  • inherit environment
  • inherit file descriptors unless redirected

🔧 Techniques

🔧 Use wait to reap children

1
2
3
cmd &
pid=$!
wait "$pid"

Prevents zombies.

🔧 Use trap for clean shutdown

1
trap 'cleanup; exit 0' SIGINT SIGTERM

🔧 Use process substitution to avoid unnecessary subshells

1
diff <(sort a) <(sort b)

🔧 Use set -m (job control) only in interactive shells

Never enable job control in scripts.


⚠️ Pitfalls

⚠️ Zombie processes from unreaped children

1
2
cmd &
# no wait → zombie

⚠️ Ctrl‑C not stopping pipelines

If the shell is not managing process groups correctly, only the foreground process receives SIGINT.

⚠️ Traps not firing in subshells

1
( trap 'echo hi' EXIT )

The trap runs in the subshell, not the parent.

⚠️ Using kill -9 as a default

SIGKILL prevents cleanup and can corrupt state.


🚨 Real‑World Failures

🚨 Failure: Shell script used as PID 1 leaks zombies

In Docker:

1
CMD ["sh", "-c", "run-app.sh"]

sh becomes PID 1 → does NOT reap children → zombies accumulate.

Fix:

  • use tini or dumb-init
  • or implement a SIGCHLD handler + wait

🚨 Failure: CI job hangs due to orphaned background process

1
2
long_task &
exit 0

The background process keeps running → CI never finishes.

Fix:

1
trap 'kill 0' EXIT

🚨 Failure: Ctrl‑C doesn’t stop a pipeline

1
cmd1 | cmd2

If the shell doesn’t set a unified process group, only cmd2 receives SIGINT.


🛠️ Patterns

🛠️ Pattern: Explicit signal handling

1
trap 'echo stopping; kill 0; exit' SIGINT SIGTERM

🛠️ Pattern: Use wait for all children

1
2
3
4
5
6
7
8
9
pids=()
for x in {1..5}; do
  worker "$x" &
  pids+=("$!")
done

for pid in "${pids[@]}"; do
  wait "$pid"
done

🛠️ Pattern: Use a minimal init in containers

tini or dumb-init solves:

  • zombie reaping
  • signal forwarding
  • predictable shutdown

❌ Anti‑Patterns

❌ Anti‑pattern: Using shell as a process supervisor

Shell is not systemd. Avoid:

  • long‑running loops
  • manual restarts
  • complex signal routing

❌ Anti‑pattern: Ignoring SIGCHLD

Leads to zombie accumulation.

❌ Anti‑pattern: Running background jobs without cleanup

1
2
cmd &
exit

🔍 Debugging

🔍 Inspect process tree

1
2
ps f
pstree -p

🔍 Inspect process groups

1
ps -o pid,pgid,comm

🔍 Trace signals

1
strace -e trace=signal -f sh script.sh

⚙️ Performance

⚙️ Avoid excessive forking

Use builtins where possible.

⚙️ Avoid long‑running background loops

They consume CPU and complicate shutdown.

⚙️ Use wait -n (Bash) for efficient worker pools


🐳 Containers

🐳 Shell as PID 1

PID 1 has special semantics:

  • ignores some signals by default
  • must reap children
  • must forward signals

🐳 Use an init wrapper

1
2
ENTRYPOINT ["/usr/bin/tini", "--"]
CMD ["run.sh"]

🛰️ CI/CD

🛰️ Ensure deterministic shutdown

CI runners kill jobs with SIGTERM → scripts must handle it.

🛰️ Avoid background jobs unless necessary

They often outlive the job and cause hangs.


🧠 Summary

Process control is the backbone of reliable shell scripting. Mastering:

  • process groups
  • signals
  • job control
  • subshells
  • reaping
  • PID 1 behavior

…is essential for writing safe, predictable, production‑grade automation.