Przejdź do treści

⚙️ Advanced Shell Execution Model

🧠 Overview

This document explains how the shell actually executes commands: forks, execs, subshells, pipelines, builtins, redirections, process groups, and the lifecycle of a command from AST node to running process. Understanding this model is essential for writing predictable, safe, and high‑performance shell scripts.


🎓 Who this is for

  • Engineers writing complex scripts or orchestrators.
  • DevOps/SRE working with CI/CD, containers, and automation.
  • Anyone debugging weird shell behavior (subshells, zombies, pipelines).
  • People who want to understand the shell as a runtime, not syntax.

🧩 Internals / Mechanics

🧩 Execution phases

Once parsing and expansion are complete, the shell executes commands using this model:

  1. Determine command type
  2. builtin
  3. function
  4. external program
  5. compound command
  6. subshell

  7. Prepare redirections

  8. open files
  9. duplicate file descriptors
  10. set up pipes

  11. Execute

  12. builtins run in the shell process
  13. external commands require fork()execve()
  14. pipelines create multiple children
  15. subshells create isolated environments

  16. Wait / collect status

  17. update $?
  18. update job table
  19. propagate pipeline exit codes

🧩 Builtins vs external commands

Type Fork? Affects shell state? Examples
Builtin ❌ No ✔ Yes cd, export, set, read
External ✔ Yes ❌ No ls, grep, awk, sed
Function ❌ No ✔ Yes user‑defined
Subshell ✔ Yes ❌ No (cd /tmp)

This distinction is critical for understanding why some commands persist state and others don’t.

🧩 Pipelines

A pipeline like:

1
cmd1 | cmd2 | cmd3

creates N processes, often in a single process group. Depending on the shell:

  • cmd1 may run in a subshell
  • cmd2 may run in a subshell
  • cmd3 may run in a subshell

This means:

  • variable changes inside pipelines often do not persist
  • exit code of the pipeline depends on pipefail

🔧 Techniques

🔧 Use builtins to avoid unnecessary forks

Prefer:

1
2
3
[[ "$x" == foo* ]]
(( i++ ))
printf '%s\n' "$var"

over:

1
2
3
grep
expr
echo

🔧 Use grouping to control execution environment

  • ( ... ) → subshell
  • { ...; } → same shell

Example:

1
2
3
4
5
# subshell, PWD does not persist
( cd /tmp )

# same shell, PWD persists
{ cd /tmp; }

🔧 Control pipeline exit behavior

1
set -o pipefail

ensures the pipeline fails if any command fails.


⚠️ Pitfalls

⚠️ Expecting state to persist across pipelines

1
2
3
4
5
count=0
echo "a b c" | while read _; do
  count=$((count+1))
done
echo "$count"   # often prints 0

Because the while runs in a subshell.

⚠️ Misunderstanding command substitution

1
2
result=$(cd /tmp && pwd)
pwd   # unchanged

Command substitution always runs in a subshell.

⚠️ Redirection order surprises

1
cmd >file 2>&1

is different from:

1
cmd 2>&1 >file

because redirections are applied left to right.


🚨 Real‑World Failures

🚨 Failure: Pipeline hides failure in CI

1
docker build . | tee build.log

If docker build fails, the pipeline exit code is 0 unless pipefail is set.

Fix:

1
2
set -o pipefail
docker build . | tee build.log

🚨 Failure: Subshell breaks deployment logic

1
2
( cd deploy && terraform apply )
# expecting state to persist — it doesn't

Terraform state ends up in the wrong directory or not applied at all.


🛠️ Patterns

🛠️ Pattern: Explicit execution boundaries

Use:

  • { ...; } for shared state
  • ( ... ) for isolated execution

This makes intent clear.

🛠️ Pattern: Fail‑fast pipelines

Always:

1
set -euo pipefail

in CI/CD or production scripts.

🛠️ Pattern: Minimize forks in tight loops

Use builtins and arithmetic expansions.


❌ Anti‑Patterns

❌ Anti‑pattern: Using echo for data processing

echo is not reliable for structured output. Use printf.

❌ Anti‑pattern: Relying on pipeline side effects

Pipelines are for data flow, not state mutation.

❌ Anti‑pattern: Silent failure swallowing

Scripts that ignore exit codes create unpredictable execution graphs.


🔍 Debugging

🔍 Trace execution with set -x

Shows:

  • expansions
  • redirections
  • forks
  • executed commands

🔍 Inspect process tree

Use:

1
2
ps f
pstree -p

to see how pipelines and subshells spawn.

🔍 Debug redirections

Use:

1
strace -e trace=process,desc -f sh script.sh

⚙️ Performance

⚙️ Avoid fork bombs

Every external command = fork + exec. In loops, this becomes expensive.

⚙️ Use builtins for arithmetic and tests

1
2
((i++))
[[ -f file ]]

⚙️ Batch operations with xargs

1
printf '%s\0' *.log | xargs -0 -P"$(nproc)" gzip

🧵 Process Control

🧵 Foreground vs background

Foreground job receives:

  • SIGINT
  • SIGQUIT
  • terminal signals

Background jobs do not.

🧵 Process groups

Pipelines often share a process group. Signals propagate to the whole group.


🐳 Containers

🐳 Shell as PID 1

If the shell is PID 1:

  • it must reap zombies
  • it must forward signals
  • it must handle SIGTERM explicitly

Otherwise processes leak or fail to stop.


🛰️ CI/CD

🛰️ Deterministic execution

CI shells must:

  • fail fast
  • avoid interactive features
  • avoid relying on user dotfiles
  • log clearly

🛰️ Use explicit exit codes

1
command || { echo "failed"; exit 1; }

🧠 Summary

The shell execution model is built on:

  • fork/exec
  • subshells
  • builtins
  • pipelines
  • redirections
  • process groups

Mastering these mechanics makes scripts predictable, safe, and production‑ready.