Przejdź do treści

🚨 Advanced Shell Error Handling

🧠 Overview

Error handling in shell scripting is deceptively complex. Unlike modern languages, the shell has:

  • no exceptions
  • no stack unwinding
  • no structured error types
  • implicit propagation rules
  • context‑dependent exit behavior
  • silent failure modes

This document explains how to design predictable, fail‑fast, production‑grade error handling in POSIX shells.


🎓 Who this is for

  • DevOps/SRE writing CI/CD pipelines or deployment scripts.
  • Engineers building automation, entrypoints, or orchestration logic.
  • Anyone who wants deterministic behavior instead of “shell roulette”.
  • People debugging silent failures, partial runs, or inconsistent exit codes.

🧩 Internals / Mechanics

🧩 Exit codes

Every command returns an integer exit status:

  • 0 → success
  • 1–125 → command‑defined failure
  • 126 → command found but not executable
  • 127 → command not found
  • 128+N → terminated by signal N

Stored in $?.

🧩 How the shell decides to stop or continue

By default:

  • the shell does not stop on errors
  • pipelines return the exit code of the last command
  • functions propagate exit codes
  • subshells isolate exit behavior

🧩 set -e (errexit)

set -e stops the script when a command fails, except in these cases:

  • inside if, while, until, &&, ||
  • inside pipelines (unless pipefail is set)
  • inside command substitutions
  • inside subshells

This makes set -e powerful but dangerous if misunderstood.

🧩 set -o pipefail

Without pipefail:

1
2
false | true
echo $?   # 0

With pipefail:

1
2
3
set -o pipefail
false | true
echo $?   # 1

🔧 Techniques

🔧 Use the “strict mode” trio

1
set -euo pipefail
  • -e → fail fast
  • -u → error on unset variables
  • -o pipefail → fail on pipeline errors

🔧 Validate all inputs early

1
2
: "${WORKSPACE:?must be set}"
: "${ENV:?missing ENV}"

🔧 Use || exit for explicit control

1
command || { echo "failed"; exit 1; }

🔧 Use functions to encapsulate error boundaries

1
2
3
4
deploy() {
  set -e
  ...
}

🔧 Use trap for cleanup

1
trap 'cleanup; exit 1' ERR

⚠️ Pitfalls

⚠️ set -e not triggering when expected

1
2
set -e
cmd || true   # suppresses errexit

⚠️ Silent failures in pipelines

1
docker build . | tee log.txt

⚠️ Unset variables causing unexpected crashes

1
echo "$UNSET"   # with -u → script aborts

⚠️ Errors hidden in command substitutions

1
result=$(failing_cmd)   # exit code lost

⚠️ Subshells swallowing errors

1
2
( failing_cmd )
echo $?   # parent sees 0 unless set -e inside subshell

🚨 Real‑World Failures

🚨 Failure: Deployment script continues after failed build

1
2
npm run build
docker build .

If npm run build fails, script continues → deploys broken image.

Fix:

1
2
3
set -e
npm run build
docker build .

🚨 Failure: CI pipeline passes despite failing stage

1
lint | tee lint.log

lint fails → tee succeeds → pipeline exit = 0.

Fix:

1
2
set -o pipefail
lint | tee lint.log

🚨 Failure: Cleanup not executed on error

1
trap cleanup EXIT

If the script crashes before setting the trap → no cleanup.

Fix:

Set traps at the top of the script.


🛠️ Patterns

🛠️ Pattern: Fail fast, fail loud

1
set -euo pipefail

🛠️ Pattern: Explicit error boundaries

1
2
3
4
safe_section() {
  set -e
  ...
}

🛠️ Pattern: Use ERR trap for global error handling

1
trap 'echo "Error on line $LINENO"' ERR

🛠️ Pattern: Validate invariants before doing anything destructive

1
[ -d "$TARGET" ] || exit 1

❌ Anti‑Patterns

❌ Anti‑pattern: Ignoring exit codes

1
2
cmd
next_cmd

❌ Anti‑pattern: Using echo for error messages

Use:

1
printf '%s\n' "error" >&2

❌ Anti‑pattern: Relying on implicit behavior of set -e

❌ Anti‑pattern: Swallowing errors with || true


🔍 Debugging

🔍 Use set -x to trace execution

Shows expanded commands before execution.

🔍 Use PS4 for detailed tracing

1
2
export PS4='+ ${BASH_SOURCE}:${LINENO}:${FUNCNAME[0]}: '
set -x

🔍 Use trap '...' ERR to log failures


⚙️ Performance

⚙️ Avoid expensive error‑checking loops

Prefer batch validation.

⚙️ Avoid unnecessary command substitutions

They fork.

⚙️ Use builtins for checks

1
[[ -f file ]]

🧵 Process Control

🧵 Errors inside subshells do not propagate

1
2
( failing_cmd )
echo $?   # often 0

🧵 Pipelines hide errors without pipefail

🧵 Traps behave differently in subshells


🐳 Containers

🐳 PID 1 ignores some signals

Error handlers must explicitly trap SIGTERM.

🐳 Entry points must validate environment variables

Containers often run with missing or empty env vars.


🛰️ CI/CD

🛰️ CI must fail fast

Silent failures → broken deployments.

🛰️ Always enable strict mode in CI

1
set -euo pipefail

🛰️ Log errors clearly


🧠 Summary

Error handling in shell scripting is about:

  • strict mode
  • explicit boundaries
  • predictable exit behavior
  • traps
  • validation
  • avoiding silent failures

Mastering these techniques makes scripts safe, deterministic, and production‑ready.