🛰️ Advanced Shell CI/CD
🧠 Overview
Shell scripts are the backbone of CI/CD systems. They glue together:
- build tools
- test runners
- artifact pipelines
- deployment logic
- container tooling
- cloud CLIs
But CI/CD environments are non‑interactive, ephemeral, strict, and unforgiving. This module teaches how to write deterministic, idempotent, fail‑fast, production‑grade shell scripts for CI/CD pipelines.
CI/CD is where shell scripts must be the most predictable, because:
- failures must be explicit
- logs must be structured
- environment must be validated
- commands must be deterministic
- pipelines must not hide errors
- subshells must not swallow exit codes
- containers must not hang on shutdown
This module expands your original Extended version into a full Advanced reference.
🎓 Who this is for
- DevOps/SRE building or maintaining CI/CD pipelines
- Engineers writing build/test/deploy automation
- Anyone dealing with flaky pipelines, silent failures, or inconsistent behavior
- People who want predictable, reproducible automation
- Engineers working with GitHub Actions, GitLab CI, Jenkins, Azure Pipelines, CircleCI, Argo, Tekton
🧩 Role in the Ecosystem
CI/CD scripting interacts with:
- Advanced Shell Expansions
- Subshells & Environment
- Advanced Pipelines
- Process Control
- Advanced Error Handling
- Shell in Containers
- POSIX Shell Compatibility
CI/CD is the stress test of shell correctness.
🧩 Internals / Mechanics
🧩 CI/CD is a hostile environment
CI/CD runners typically have:
- no interactive shell
- no aliases
- no user dotfiles
- minimal PATH
- strict timeouts
- unpredictable parallelism
- ephemeral filesystems
- limited logging
- minimal locale settings
- non‑login shells
- no job control
- no TTY
This means:
- no assumptions about environment
- no interactive features
- no implicit PATH
- no shell startup files
- no user configuration
🧩 CI/CD shells must be deterministic
Determinism requires:
- explicit environment validation
- explicit exit behavior
- explicit dependencies
- explicit paths
- explicit cleanup
- explicit error handling
- explicit logging
A CI/CD script must behave identically:
- on every runner
- on every OS image
- on every container
- on every retry
- on every branch
🧩 CI/CD pipelines rely heavily on exit codes
0→ success- non‑zero → fail the job
- pipelines without
pipefailhide failures - command substitution hides failures
- subshells isolate failures
- background jobs detach and hide failures
set -edoes NOT propagate through pipelines
This is why CI/CD scripts must explicitly configure:
1 | |
🧩 Deep Internals: How CI/CD shells differ from normal shells
🧩 No interactive features
CI shells do NOT support:
- job control
- prompts
- readline
- interactive
read - interactive
sudo - interactive editors
🧩 No user environment
CI shells do NOT load:
.bashrc.profile.bash_profile.zshrc
This means:
- no aliases
- no functions
- no PATH modifications
- no environment defaults
🧩 Runners use different shells
GitHub Actions → bash
GitLab CI → sh (dash or busybox)
Jenkins → depends on agent
Docker runners → /bin/sh (often busybox)
Alpine → ash
Ubuntu → dash for /bin/sh
This means:
- no Bash‑only features unless explicitly using bash
- no arrays
- no brace expansion
- no process substitution
- no extglob
Unless you explicitly run:
1 | |
🧩 Subshell behavior in CI/CD
Pipelines spawn subshells:
1 | |
In POSIX shells:
- each segment runs in a subshell
- variables modified inside do NOT propagate
Example:
1 2 3 4 5 | |
This causes massive CI/CD bugs.
🧩 Command substitution strips newlines
1 2 | |
This breaks:
- JSON
- YAML
- multi‑line secrets
- certificates
- SSH keys
🧩 Word splitting destroys arguments
1 | |
If $pattern contains spaces → multiple arguments.
🧩 Globbing expands unexpectedly
1 | |
If $FILES contains * → expands to filesystem.
🔧 Techniques
🔧 Always enable strict mode
1 | |
Adds:
-e→ exit on error-u→ error on unset variables-o pipefail→ pipeline fails if any segment fails
🔧 Validate all required environment variables
1 2 3 4 | |
🔧 Validate required tools
1 2 3 4 | |
🔧 Use absolute paths
1 | |
🔧 Log everything to stderr
1 | |
🔧 Use trap for cleanup
1 | |
⚠️ Pitfalls
⚠️ Silent failures in pipelines
1 | |
docker build fails → tee succeeds → pipeline exit = 0.
⚠️ Unset variables breaking deployments
1 | |
If $ENV is empty → deploys to wrong environment.
⚠️ Race conditions in parallel CI jobs
⚠️ Using interactive commands
⚠️ Relying on user dotfiles
🚨 Real‑World Failures
🚨 Build passes despite failing tests
🚨 Deployment script deploys to production accidentally
🚨 CI job hangs due to background process
🛠️ Patterns
🛠️ One script per responsibility
🛠️ Idempotent scripts
🛠️ Explicit inputs, explicit outputs
🛠️ Use JSON for structured output
🛠️ Fail fast, fail loud
❌ Anti‑Patterns
- Using
echofor structured data - Relying on implicit PATH
- Running interactive commands
- Silent failure swallowing
- Using pipelines for state mutation
10. Deep CI/CD Internals
CI/CD systems are not just “shells that run scripts”. They are distributed orchestration engines with:
- isolated runners
- ephemeral filesystems
- containerized execution
- strict timeouts
- parallel job scheduling
- caching layers
- artifact storage
- environment injection
- secret management
- sandboxing
- network isolation
Understanding these internals is essential for writing deterministic shell scripts.
10.1 CI Runner Architecture
Different CI systems use different execution models:
GitHub Actions
- each step runs in a fresh shell
- environment does NOT persist between steps
- default shell:
bashon Linux,pwshon Windows - composite actions run in subshells
- containers run as isolated jobs
GitLab CI
- default shell:
/bin/sh(dash or busybox) - jobs run inside Docker or Podman
- each job has its own filesystem
- artifacts must be explicitly saved
Jenkins
- depends on agent
- may run on bare metal, VM, or container
- shell may be bash, dash, zsh, or busybox
Azure Pipelines
- uses bash on Linux
- PowerShell on Windows
- jobs run in containers or VMs
CircleCI
- jobs run inside containers
/bin/shis often busybox- minimal environment
10.2 Shell Differences in CI/CD
Most CI/CD systems use POSIX shells, not Bash.
This means:
| Feature | Bash | POSIX (dash/ash) |
|---|---|---|
| Arrays | ✔ | ❌ |
| Associative arrays | ✔ | ❌ |
| Brace expansion | ✔ | ❌ |
| Process substitution | ✔ | ❌ |
[[ ]] |
✔ | ❌ |
echo -e |
inconsistent | inconsistent |
source |
✔ | ❌ (. only) |
mapfile |
✔ | ❌ |
shopt |
✔ | ❌ |
If your script uses Bash features, you MUST explicitly run:
1 2 | |
Otherwise the runner will use /bin/sh.
10.3 Environment Injection
CI/CD systems inject environment variables:
- repository metadata
- commit SHA
- branch name
- pipeline ID
- job ID
- secrets
- tokens
- paths
But:
- values may contain spaces
- values may contain newlines
- values may contain JSON
- values may contain YAML
- values may contain special characters
This breaks:
- unquoted expansions
- word splitting
- command substitution
- globbing
Example:
1 | |
May contain JSON with newlines → must be quoted.
10.4 Secret Handling
Secrets may contain:
- newlines
- spaces
- quotes
- backslashes
- base64
- PEM blocks
Never do:
1 | |
Always:
1 | |
10.5 Filesystem Semantics
CI/CD filesystems are:
- ephemeral
- isolated
- often mounted as tmpfs
- sometimes read‑only
- sometimes container overlay layers
This affects:
- caching
- artifact storage
- temporary files
- cleanup
- concurrency
11. Advanced Error Semantics
11.1 set -e is NOT enough
set -e does NOT trigger on:
if cmdcmd || truecmd | othercmd &[[ ]]tests- assignments
- subshells
This is why CI/CD scripts MUST use:
1 | |
11.2 pipefail is mandatory
Without pipefail:
1 | |
If cmd1 fails but cmd3 succeeds → pipeline exit = 0.
This is the #1 cause of silent CI failures.
11.3 Subshells hide failures
1 | |
If it fails → parent shell continues.
11.4 Command substitution hides failures
1 | |
If dangerous_command fails → exit code is lost.
11.5 Background jobs hide failures
1 | |
Exit code is detached.
12. Advanced Pipelines in CI/CD
Pipelines in CI/CD behave differently than interactive shells.
12.1 Each pipeline segment runs in a subshell
1 2 3 4 | |
12.2 Pipelines break variable propagation
Never do:
1 2 3 | |
Do:
1 2 3 4 5 | |
12.3 Pipelines break set -e
Example:
1 2 | |
If cmd1 fails → pipeline continues.
12.4 Pipelines break trap ERR
trap ERR does NOT fire inside pipelines unless:
1 | |
13. Advanced Subshell Behavior
Subshells are everywhere in CI/CD:
- pipelines
- command substitution
- process substitution
- grouping
( ) - background jobs
- subshells inside Docker entrypoints
Subshells:
- isolate variables
- isolate traps
- isolate
set -e - isolate
set -u - isolate
pipefail - isolate working directory
13.1 Subshells break stateful logic
1 2 3 4 5 | |
13.2 Subshells break cleanup
1 2 3 4 5 | |
13.3 Subshells break environment mutation
1 2 3 | |
14. Advanced CI/CD Logging
Logging must be:
- structured
- timestamped
- machine‑readable
- stderr‑based
- consistent
14.1 Use stderr for logs
1 | |
14.2 Use JSON for machine‑readable logs
1 | |
14.3 Avoid echo
echo is:
- non‑portable
- inconsistent
- locale‑dependent
- escapes vary
Use printf.
15. Advanced CI/CD Environment Validation
15.1 Validate required variables
1 2 3 4 | |
15.2 Validate required tools
1 2 3 4 | |
15.3 Validate required directories
1 2 | |
15.4 Validate required files
1 2 | |
16. Advanced CI/CD Secrets Handling
Secrets may contain:
- newlines
- spaces
- quotes
- JSON
- YAML
- PEM blocks
- base64
Never:
1 | |
Always:
1 | |
17. Advanced CI/CD Concurrency
CI/CD systems run jobs:
- in parallel
- on different machines
- in different containers
- with different environments
This creates:
- race conditions
- inconsistent state
- conflicting deployments
- corrupted caches
- partial artifacts
17.1 Use file locks
1 | |
17.2 Use atomic writes
1 2 3 | |
17.3 Use retries with backoff
1 2 3 4 | |
18. Advanced CI/CD Caching
Caching in CI/CD is fundamentally different from caching on a developer machine. It is:
- non‑deterministic — runners differ between jobs
- inconsistent across machines — different OS images, different shells
- invalidated by timestamps — filesystem metadata changes unpredictably
- invalidated by permissions — UID/GID differ between runners
- invalidated by environment differences — locale, PATH, tool versions
- sensitive to container layers — overlayfs behaves differently
- sensitive to concurrency — parallel jobs corrupt shared caches
Caching must be explicit, content‑addressed, validated, and reproducible.
18.1 Use content‑addressable caching
The only reliable cache key is a hash of the inputs.
Example:
1 | |
This ensures:
- cache invalidates when inputs change
- cache persists when inputs are identical
- cache is independent of timestamps
- cache is independent of runner environment
Content‑addressable caching is the only deterministic strategy.
18.2 Avoid timestamp‑based caching
Timestamp‑based caching fails in CI/CD because:
- files are restored with fresh timestamps
- container layers rewrite timestamps
- Git checkouts normalize timestamps
- artifact downloads rewrite timestamps
- runners may use different timezones
- clocks may drift between machines
Example of a broken pattern:
1 2 3 | |
This fails because:
deploy.yamlmay appear “newer” even if unchangeddeploy.cachemay appear “older” after restore- overlayfs may rewrite mtime on extraction
Never rely on -nt, -ot, or mtime in CI/CD.
18.3 Avoid relying on filesystem metadata
Filesystem metadata is not stable across CI/CD runners:
- permissions differ
- ownership differs
- inode numbers differ
- mtime/ctime differ
- overlayfs rewrites metadata
- Docker COPY normalizes metadata
- Git checkouts normalize metadata
Broken pattern:
1 2 3 | |
This fails because:
%Y(mtime) changes on extraction%U(owner) differs between runners%G(group) differs%i(inode) differs%n(link count) differs
Metadata is not portable. Never use it for caching logic.
18.4 Always validate cache contents
Caches must be validated, not trusted.
Example:
1 2 3 4 5 6 7 8 | |
Validation strategies:
- hash validation — store a checksum next to the cache
- structural validation — ensure required files exist
- semantic validation — ensure versions match
- toolchain validation — ensure compiler/interpreter versions match
Example semantic validation:
1 2 3 4 5 6 7 | |
Example structural validation:
1 2 3 4 5 6 7 8 | |
Never assume a cache is valid just because it exists.
19. Advanced Real‑World Failures
Real CI/CD failures are rarely caused by “syntax errors”. They come from:
- subshell isolation
- hidden exit codes
- race conditions
- corrupted caches
- broken pipelines
- missing environment
- inconsistent runners
- zombie processes
- PID1 behavior in containers
- newline stripping
- word splitting
- globbing
- tool version drift
Poniżej — pełna lista realnych incydentów produkcyjnych, z analizą przyczyn i poprawkami.
19.1 Failure: Pipeline passes despite failing tests
Symptom
1 | |
npm test fails → tee succeeds → pipeline exit = 0.
Root cause
- pipeline exit code = exit code of last command
teealways exits 0set -edoes NOT fix thistrap ERRdoes NOT fire inside pipelines
Fix
1 2 | |
19.2 Failure: Deployment script deploys to production accidentally
Symptom
1 | |
Root cause
- missing environment validation
$ENVempty or wrong- CI runner uses default kubeconfig
- deployment goes to wrong cluster
Fix
1 2 | |
19.3 Failure: CI job hangs due to background process
Symptom
1 2 | |
Root cause
- background job keeps running
- runner waits for all processes
- job never terminates
Fix
1 | |
19.4 Failure: Multi‑line secrets break scripts
Symptom
1 | |
SECRET contains:
- newlines
- quotes
- PEM blocks
Output becomes corrupted.
Fix
1 | |
19.5 Failure: JSON corrupted by command substitution
Symptom
1 | |
Command substitution strips trailing newlines → JSON invalid.
Fix
1 2 | |
lub:
1 | |
19.6 Failure: Race condition between parallel CI jobs
Symptom
Two jobs write to same cache directory.
Root cause
- no locking
- no atomic writes
- shared filesystem
Fix
1 | |
19.7 Failure: Wrong tool version used
Symptom
python3 is different on different runners.
Fix
Validate:
1 | |
Fail if mismatch.
19.8 Failure: Docker build uses stale cache
Symptom
Docker layer cache invalidated unpredictably.
Fix
Use content‑addressable cache keys.
20. Advanced Debugging Techniques
Debugging CI/CD scripts requires visibility, not guesswork.
20.1 Enable debug mode conditionally
1 2 | |
20.2 Use PS4 for deep tracing
1 2 | |
Shows:
- file
- line
- function
- expanded command
20.3 Trace syscalls
1 | |
Useful for:
- FD leaks
- zombie processes
- missing files
- permission issues
20.4 Trace signals
1 | |
Useful for:
- SIGTERM not delivered
- PID1 ignoring signals
- stuck containers
20.5 Inspect environment snapshot
1 | |
Useful for:
- missing variables
- wrong values
- overwritten secrets
20.6 Inspect file descriptors
1 | |
Useful for:
- leaked pipes
- leaked sockets
- broken redirections
20.7 Inspect subshell boundaries
Add:
1 | |
Useful for:
- pipeline subshells
- command substitution subshells
- Docker entrypoint subshells
21. Advanced Performance Engineering
CI/CD performance is not about CPU — it’s about:
- avoiding unnecessary forks
- avoiding unnecessary clones
- avoiding unnecessary downloads
- avoiding unnecessary rebuilds
- avoiding unnecessary pipelines
- avoiding unnecessary subshells
21.1 Avoid slow loops
Bad:
1 2 3 | |
Good:
1 | |
21.2 Avoid unnecessary cloning
Use shallow clones:
1 | |
21.3 Avoid unnecessary scanning
Cache dependency graphs.
21.4 Avoid unnecessary forks
Bad:
1 2 3 | |
Good:
1 | |
21.5 Use streaming tools
jqawksedgrep
These avoid loading entire files into memory.
22. Advanced Process Control in CI/CD
Process control is critical in CI/CD because:
- runners send SIGTERM on timeout
- containers use PID1 semantics
- background jobs cause hangs
- subshells isolate traps
- pipelines break error propagation
22.1 Always handle SIGTERM
1 | |
22.2 Always reap children
1 | |
22.3 Avoid background jobs unless necessary
Background jobs:
- hide exit codes
- break determinism
- cause hangs
- cause zombie accumulation
22.4 Use exec when launching long‑running processes
1 | |
Ensures:
- correct signal forwarding
- no zombie shell
- no double PID tree
23. Advanced Signals in CI/CD
CI/CD runners send:
- SIGTERM → graceful shutdown
- SIGKILL → forced shutdown
- SIGINT → manual cancellation
Containers often ignore SIGTERM unless:
- PID1 forwards signals
- entrypoint uses
exec - init system (tini/dumb-init) is used
23.1 Detecting signal handling issues
1 | |
23.2 Forwarding signals manually
1 2 3 | |
24. Advanced Containers Interactions
CI/CD often runs inside containers. This introduces:
- PID1 semantics
- zombie reaping
- signal forwarding
- overlayfs behavior
- missing environment
- missing PATH
- missing locale
24.1 Always use exec in entrypoints
1 | |
24.2 Use tini or dumb-init
1 | |
24.3 Validate environment inside container
1 | |
24.4 Avoid long‑running loops
Containers must exit cleanly.
25. Advanced CI/CD Lifecycle
A CI/CD job has phases:
- environment injection
- workspace preparation
- dependency installation
- build
- test
- packaging
- deployment
- cleanup
Each phase must be:
- deterministic
- idempotent
- validated
- logged
- fail‑fast
26. Full Patterns
26.1 Pattern: One script per responsibility
26.2 Pattern: Idempotent scripts
26.3 Pattern: Explicit inputs, explicit outputs
26.4 Pattern: JSON for structured output
26.5 Pattern: Fail fast, fail loud
26.6 Pattern: Validate everything
26.7 Pattern: Use exec for long‑running processes
26.8 Pattern: Use content‑addressable caching
26.9 Pattern: Use atomic writes
26.10 Pattern: Use file locks
27. Full Anti‑Patterns
- Using
echofor structured data - Relying on implicit PATH
- Running interactive commands
- Silent failure swallowing
- Using pipelines for state mutation
- Using timestamp‑based caching
- Using metadata‑based caching
- Using background jobs without cleanup
- Using shell as PID1
- Using
tail -fas main process
28. Summary
CI/CD shell scripting requires:
- strict mode
- deterministic behavior
- explicit validation
- predictable exit codes
- safe pipelines
- structured logging
- idempotency
- no interactive assumptions
- correct signal handling
- correct process control
- correct subshell behavior
- correct caching
- correct environment validation
- correct container semantics
Mastering these techniques produces reliable, reproducible, production‑grade pipelines.