🚨 Advanced Shell Error Handling
🧠 Overview
Error handling in shell scripting is deceptively complex. Unlike modern languages, the shell has:
- no exceptions
- no stack unwinding
- no structured error types
- implicit propagation rules
- context‑dependent exit behavior
- silent failure modes
This document explains how to design predictable, fail‑fast, production‑grade error handling in POSIX shells.
🎓 Who this is for
- DevOps/SRE writing CI/CD pipelines or deployment scripts.
- Engineers building automation, entrypoints, or orchestration logic.
- Anyone who wants deterministic behavior instead of “shell roulette”.
- People debugging silent failures, partial runs, or inconsistent exit codes.
🧩 Internals / Mechanics
🧩 Exit codes
Every command returns an integer exit status:
0 → success
1–125 → command‑defined failure
126 → command found but not executable
127 → command not found
128+N → terminated by signal N
Stored in $?.
🧩 How the shell decides to stop or continue
By default:
- the shell does not stop on errors
- pipelines return the exit code of the last command
- functions propagate exit codes
- subshells isolate exit behavior
🧩 set -e (errexit)
set -e stops the script when a command fails, except in these cases:
- inside
if, while, until, &&, ||
- inside pipelines (unless
pipefail is set)
- inside command substitutions
- inside subshells
This makes set -e powerful but dangerous if misunderstood.
🧩 set -o pipefail
Without pipefail:
With pipefail:
| set -o pipefail
false | true
echo $? # 1
|
🔧 Techniques
🔧 Use the “strict mode” trio
-e → fail fast
-u → error on unset variables
-o pipefail → fail on pipeline errors
| : "${WORKSPACE:?must be set}"
: "${ENV:?missing ENV}"
|
🔧 Use || exit for explicit control
| command || { echo "failed"; exit 1; }
|
🔧 Use functions to encapsulate error boundaries
🔧 Use trap for cleanup
| trap 'cleanup; exit 1' ERR
|
⚠️ Pitfalls
⚠️ set -e not triggering when expected
| set -e
cmd || true # suppresses errexit
|
⚠️ Silent failures in pipelines
| docker build . | tee log.txt
|
⚠️ Unset variables causing unexpected crashes
| echo "$UNSET" # with -u → script aborts
|
⚠️ Errors hidden in command substitutions
| result=$(failing_cmd) # exit code lost
|
⚠️ Subshells swallowing errors
| ( failing_cmd )
echo $? # parent sees 0 unless set -e inside subshell
|
🚨 Real‑World Failures
🚨 Failure: Deployment script continues after failed build
| npm run build
docker build .
|
If npm run build fails, script continues → deploys broken image.
Fix:
| set -e
npm run build
docker build .
|
🚨 Failure: CI pipeline passes despite failing stage
lint fails → tee succeeds → pipeline exit = 0.
Fix:
| set -o pipefail
lint | tee lint.log
|
🚨 Failure: Cleanup not executed on error
If the script crashes before setting the trap → no cleanup.
Fix:
Set traps at the top of the script.
🛠️ Patterns
🛠️ Pattern: Fail fast, fail loud
🛠️ Pattern: Explicit error boundaries
| safe_section() {
set -e
...
}
|
🛠️ Pattern: Use ERR trap for global error handling
| trap 'echo "Error on line $LINENO"' ERR
|
🛠️ Pattern: Validate invariants before doing anything destructive
| [ -d "$TARGET" ] || exit 1
|
❌ Anti‑Patterns
❌ Anti‑pattern: Ignoring exit codes
❌ Anti‑pattern: Using echo for error messages
Use:
| printf '%s\n' "error" >&2
|
❌ Anti‑pattern: Relying on implicit behavior of set -e
❌ Anti‑pattern: Swallowing errors with || true
🔍 Debugging
🔍 Use set -x to trace execution
Shows expanded commands before execution.
🔍 Use PS4 for detailed tracing
| export PS4='+ ${BASH_SOURCE}:${LINENO}:${FUNCNAME[0]}: '
set -x
|
🔍 Use trap '...' ERR to log failures
⚙️ Avoid expensive error‑checking loops
Prefer batch validation.
⚙️ Avoid unnecessary command substitutions
They fork.
⚙️ Use builtins for checks
🧵 Process Control
🧵 Errors inside subshells do not propagate
| ( failing_cmd )
echo $? # often 0
|
🧵 Pipelines hide errors without pipefail
🧵 Traps behave differently in subshells
🐳 Containers
🐳 PID 1 ignores some signals
Error handlers must explicitly trap SIGTERM.
🐳 Entry points must validate environment variables
Containers often run with missing or empty env vars.
🛰️ CI/CD
🛰️ CI must fail fast
Silent failures → broken deployments.
🛰️ Always enable strict mode in CI
🛰️ Log errors clearly
🧠 Summary
Error handling in shell scripting is about:
- strict mode
- explicit boundaries
- predictable exit behavior
- traps
- validation
- avoiding silent failures
Mastering these techniques makes scripts safe, deterministic, and production‑ready.