🚨 Advanced Shell Error Handling

🧠 Overview

Error handling in shell scripting is deceptively complex. Unlike modern languages, the shell has:

no exceptions
no stack unwinding
no structured error types
implicit propagation rules
context‑dependent exit behavior
silent failure modes

This document explains how to design predictable, fail‑fast, production‑grade error handling in POSIX shells.

🎓 Who this is for

DevOps/SRE writing CI/CD pipelines or deployment scripts.
Engineers building automation, entrypoints, or orchestration logic.
Anyone who wants deterministic behavior instead of “shell roulette”.
People debugging silent failures, partial runs, or inconsistent exit codes.

🧩 Internals / Mechanics

🧩 Exit codes

Every command returns an integer exit status:

0 → success
1–125 → command‑defined failure
126 → command found but not executable
127 → command not found
128+N → terminated by signal N

Stored in $?.

🧩 How the shell decides to stop or continue

By default:

the shell does not stop on errors
pipelines return the exit code of the last command
functions propagate exit codes
subshells isolate exit behavior

🧩 `set -e` (errexit)

set -e stops the script when a command fails, except in these cases:

inside if, while, until, &&, ||
inside pipelines (unless pipefail is set)
inside command substitutions
inside subshells

This makes set -e powerful but dangerous if misunderstood.

🧩 `set -o pipefail`

Without pipefail:

false | true
echo $?   # 0

With pipefail:

set -o pipefail
false | true
echo $?   # 1

🔧 Techniques

🔧 Use the “strict mode” trio

set -euo pipefail

-e → fail fast
-u → error on unset variables
-o pipefail → fail on pipeline errors

🔧 Validate all inputs early

: "${WORKSPACE:?must be set}"
: "${ENV:?missing ENV}"

🔧 Use `|| exit` for explicit control

command || { echo "failed"; exit 1; }

🔧 Use functions to encapsulate error boundaries

deploy() {
  set -e
  ...
}

🔧 Use `trap` for cleanup

trap 'cleanup; exit 1' ERR

⚠️ Pitfalls

⚠️ `set -e` not triggering when expected

set -e
cmd || true   # suppresses errexit

⚠️ Silent failures in pipelines

docker build . | tee log.txt

⚠️ Unset variables causing unexpected crashes

echo "$UNSET"   # with -u → script aborts

⚠️ Errors hidden in command substitutions

result=$(failing_cmd)   # exit code lost

⚠️ Subshells swallowing errors

( failing_cmd )
echo $?   # parent sees 0 unless set -e inside subshell

🚨 Real‑World Failures

🚨 Failure: Deployment script continues after failed build

npm run build
docker build .

If npm run build fails, script continues → deploys broken image.

Fix:

set -e
npm run build
docker build .

🚨 Failure: CI pipeline passes despite failing stage

lint | tee lint.log

lint fails → tee succeeds → pipeline exit = 0.

Fix:

set -o pipefail
lint | tee lint.log

🚨 Failure: Cleanup not executed on error

trap cleanup EXIT

If the script crashes before setting the trap → no cleanup.

Fix:

Set traps at the top of the script.

🛠️ Patterns

🛠️ Pattern: Fail fast, fail loud

set -euo pipefail

🛠️ Pattern: Explicit error boundaries

safe_section() {
  set -e
  ...
}

🛠️ Pattern: Use `ERR` trap for global error handling

trap 'echo "Error on line $LINENO"' ERR

🛠️ Pattern: Validate invariants before doing anything destructive

[ -d "$TARGET" ] || exit 1

❌ Anti‑Patterns

❌ Anti‑pattern: Ignoring exit codes

cmd
next_cmd

❌ Anti‑pattern: Using `echo` for error messages

Use:

printf '%s\n' "error" >&2

❌ Anti‑pattern: Relying on implicit behavior of `set -e`

❌ Anti‑pattern: Swallowing errors with `|| true`

🔍 Debugging

🔍 Use `set -x` to trace execution

Shows expanded commands before execution.

🔍 Use `PS4` for detailed tracing

export PS4='+ ${BASH_SOURCE}:${LINENO}:${FUNCNAME[0]}: '
set -x

🔍 Use `trap '...' ERR` to log failures

⚙️ Performance

⚙️ Avoid expensive error‑checking loops

Prefer batch validation.

⚙️ Avoid unnecessary command substitutions

They fork.

⚙️ Use builtins for checks

[[ -f file ]]

🧵 Process Control

🧵 Errors inside subshells do not propagate

( failing_cmd )
echo $?   # often 0

🧵 Pipelines hide errors without pipefail

🧵 Traps behave differently in subshells

🐳 Containers

🐳 PID 1 ignores some signals

Error handlers must explicitly trap SIGTERM.

🐳 Entry points must validate environment variables

Containers often run with missing or empty env vars.

🛰️ CI/CD

🛰️ CI must fail fast

Silent failures → broken deployments.

🛰️ Always enable strict mode in CI

set -euo pipefail

🛰️ Log errors clearly

🧠 Summary

Error handling in shell scripting is about:

strict mode
explicit boundaries
predictable exit behavior
traps
validation
avoiding silent failures

Mastering these techniques makes scripts safe, deterministic, and production‑ready.

🚨 Advanced Shell Error Handling

🧠 Overview

🎓 Who this is for

🧩 Internals / Mechanics

🧩 Exit codes

🧩 How the shell decides to stop or continue

🧩 set -e (errexit)

🧩 set -o pipefail

🔧 Techniques

🔧 Use the “strict mode” trio

🔧 Validate all inputs early

🔧 Use || exit for explicit control

🔧 Use functions to encapsulate error boundaries

🔧 Use trap for cleanup

⚠️ Pitfalls

⚠️ set -e not triggering when expected

⚠️ Silent failures in pipelines

⚠️ Unset variables causing unexpected crashes

⚠️ Errors hidden in command substitutions

⚠️ Subshells swallowing errors

🚨 Real‑World Failures

🚨 Failure: Deployment script continues after failed build

🚨 Failure: CI pipeline passes despite failing stage

🚨 Failure: Cleanup not executed on error

🛠️ Patterns

🛠️ Pattern: Fail fast, fail loud

🛠️ Pattern: Explicit error boundaries

🛠️ Pattern: Use ERR trap for global error handling

🛠️ Pattern: Validate invariants before doing anything destructive

❌ Anti‑Patterns

❌ Anti‑pattern: Ignoring exit codes

❌ Anti‑pattern: Using echo for error messages

❌ Anti‑pattern: Relying on implicit behavior of set -e

❌ Anti‑pattern: Swallowing errors with || true

🔍 Debugging

🔍 Use set -x to trace execution

🔍 Use PS4 for detailed tracing

🔍 Use trap '...' ERR to log failures

⚙️ Performance

⚙️ Avoid expensive error‑checking loops

⚙️ Avoid unnecessary command substitutions

⚙️ Use builtins for checks

🧵 Process Control

🧵 Errors inside subshells do not propagate

🧵 Pipelines hide errors without pipefail

🧵 Traps behave differently in subshells

🐳 Containers

🐳 PID 1 ignores some signals

🐳 Entry points must validate environment variables

🛰️ CI/CD

🛰️ CI must fail fast

🛰️ Always enable strict mode in CI

🛰️ Log errors clearly

🧠 Summary

🧩 `set -e` (errexit)

🧩 `set -o pipefail`

🔧 Use `|| exit` for explicit control

🔧 Use `trap` for cleanup

⚠️ `set -e` not triggering when expected

🛠️ Pattern: Use `ERR` trap for global error handling

❌ Anti‑pattern: Using `echo` for error messages

❌ Anti‑pattern: Relying on implicit behavior of `set -e`

❌ Anti‑pattern: Swallowing errors with `|| true`

🔍 Use `set -x` to trace execution

🔍 Use `PS4` for detailed tracing

🔍 Use `trap '...' ERR` to log failures