Przejdź do treści

🛰️ Advanced Shell CI/CD

🧠 Overview

Shell scripts are the backbone of CI/CD systems. They glue together:

  • build tools
  • test runners
  • artifact pipelines
  • deployment logic
  • container tooling
  • cloud CLIs

But CI/CD environments are non‑interactive, ephemeral, strict, and unforgiving. This module teaches how to write deterministic, idempotent, fail‑fast, production‑grade shell scripts for CI/CD pipelines.

CI/CD is where shell scripts must be the most predictable, because:

  • failures must be explicit
  • logs must be structured
  • environment must be validated
  • commands must be deterministic
  • pipelines must not hide errors
  • subshells must not swallow exit codes
  • containers must not hang on shutdown

This module expands your original Extended version into a full Advanced reference.


🎓 Who this is for

  • DevOps/SRE building or maintaining CI/CD pipelines
  • Engineers writing build/test/deploy automation
  • Anyone dealing with flaky pipelines, silent failures, or inconsistent behavior
  • People who want predictable, reproducible automation
  • Engineers working with GitHub Actions, GitLab CI, Jenkins, Azure Pipelines, CircleCI, Argo, Tekton

🧩 Role in the Ecosystem

CI/CD scripting interacts with:

CI/CD is the stress test of shell correctness.


🧩 Internals / Mechanics

🧩 CI/CD is a hostile environment

CI/CD runners typically have:

  • no interactive shell
  • no aliases
  • no user dotfiles
  • minimal PATH
  • strict timeouts
  • unpredictable parallelism
  • ephemeral filesystems
  • limited logging
  • minimal locale settings
  • non‑login shells
  • no job control
  • no TTY

This means:

  • no assumptions about environment
  • no interactive features
  • no implicit PATH
  • no shell startup files
  • no user configuration

🧩 CI/CD shells must be deterministic

Determinism requires:

  • explicit environment validation
  • explicit exit behavior
  • explicit dependencies
  • explicit paths
  • explicit cleanup
  • explicit error handling
  • explicit logging

A CI/CD script must behave identically:

  • on every runner
  • on every OS image
  • on every container
  • on every retry
  • on every branch

🧩 CI/CD pipelines rely heavily on exit codes

  • 0 → success
  • non‑zero → fail the job
  • pipelines without pipefail hide failures
  • command substitution hides failures
  • subshells isolate failures
  • background jobs detach and hide failures
  • set -e does NOT propagate through pipelines

This is why CI/CD scripts must explicitly configure:

1
set -euo pipefail

🧩 Deep Internals: How CI/CD shells differ from normal shells

🧩 No interactive features

CI shells do NOT support:

  • job control
  • prompts
  • readline
  • interactive read
  • interactive sudo
  • interactive editors

🧩 No user environment

CI shells do NOT load:

  • .bashrc
  • .profile
  • .bash_profile
  • .zshrc

This means:

  • no aliases
  • no functions
  • no PATH modifications
  • no environment defaults

🧩 Runners use different shells

GitHub Actions → bash GitLab CI → sh (dash or busybox) Jenkins → depends on agent Docker runners → /bin/sh (often busybox) Alpine → ash Ubuntu → dash for /bin/sh

This means:

  • no Bash‑only features unless explicitly using bash
  • no arrays
  • no brace expansion
  • no process substitution
  • no extglob

Unless you explicitly run:

1
bash script.sh

🧩 Subshell behavior in CI/CD

Pipelines spawn subshells:

1
cmd1 | cmd2 | cmd3

In POSIX shells:

  • each segment runs in a subshell
  • variables modified inside do NOT propagate

Example:

1
2
3
4
5
count=0
echo "a b c" | while read _; do
  count=$((count+1))
done
echo "$count"   # prints 0

This causes massive CI/CD bugs.


🧩 Command substitution strips newlines

1
2
x=$(printf "a\nb\n")
printf "%s" "$x"   # prints "ab"

This breaks:

  • JSON
  • YAML
  • multi‑line secrets
  • certificates
  • SSH keys

🧩 Word splitting destroys arguments

1
grep $pattern file

If $pattern contains spaces → multiple arguments.


🧩 Globbing expands unexpectedly

1
for f in $FILES; do

If $FILES contains * → expands to filesystem.


🔧 Techniques

🔧 Always enable strict mode

1
set -euo pipefail

Adds:

  • -e → exit on error
  • -u → error on unset variables
  • -o pipefail → pipeline fails if any segment fails

🔧 Validate all required environment variables

1
2
3
4
: "${GIT_SHA:?missing GIT_SHA}"
: "${ENVIRONMENT:?missing ENVIRONMENT}"
: "${REGION:?missing REGION}"
: "${SERVICE:?missing SERVICE}"

🔧 Validate required tools

1
2
3
4
command -v docker >/dev/null 2>&1 || {
  printf '%s\n' "docker not installed" >&2
  exit 1
}

🔧 Use absolute paths

1
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"

🔧 Log everything to stderr

1
log() { printf '[%s] %s\n' "$(date +%H:%M:%S)" "$*" >&2; }

🔧 Use trap for cleanup

1
trap 'cleanup; exit 1' ERR

⚠️ Pitfalls

⚠️ Silent failures in pipelines

1
docker build . | tee build.log

docker build fails → tee succeeds → pipeline exit = 0.


⚠️ Unset variables breaking deployments

1
echo "Deploying to $ENV"

If $ENV is empty → deploys to wrong environment.


⚠️ Race conditions in parallel CI jobs

⚠️ Using interactive commands

⚠️ Relying on user dotfiles


🚨 Real‑World Failures

🚨 Build passes despite failing tests

🚨 Deployment script deploys to production accidentally

🚨 CI job hangs due to background process


🛠️ Patterns

🛠️ One script per responsibility

🛠️ Idempotent scripts

🛠️ Explicit inputs, explicit outputs

🛠️ Use JSON for structured output

🛠️ Fail fast, fail loud


❌ Anti‑Patterns

  • Using echo for structured data
  • Relying on implicit PATH
  • Running interactive commands
  • Silent failure swallowing
  • Using pipelines for state mutation

10. Deep CI/CD Internals

CI/CD systems are not just “shells that run scripts”. They are distributed orchestration engines with:

  • isolated runners
  • ephemeral filesystems
  • containerized execution
  • strict timeouts
  • parallel job scheduling
  • caching layers
  • artifact storage
  • environment injection
  • secret management
  • sandboxing
  • network isolation

Understanding these internals is essential for writing deterministic shell scripts.


10.1 CI Runner Architecture

Different CI systems use different execution models:

GitHub Actions

  • each step runs in a fresh shell
  • environment does NOT persist between steps
  • default shell: bash on Linux, pwsh on Windows
  • composite actions run in subshells
  • containers run as isolated jobs

GitLab CI

  • default shell: /bin/sh (dash or busybox)
  • jobs run inside Docker or Podman
  • each job has its own filesystem
  • artifacts must be explicitly saved

Jenkins

  • depends on agent
  • may run on bare metal, VM, or container
  • shell may be bash, dash, zsh, or busybox

Azure Pipelines

  • uses bash on Linux
  • PowerShell on Windows
  • jobs run in containers or VMs

CircleCI

  • jobs run inside containers
  • /bin/sh is often busybox
  • minimal environment

10.2 Shell Differences in CI/CD

Most CI/CD systems use POSIX shells, not Bash.

This means:

Feature Bash POSIX (dash/ash)
Arrays
Associative arrays
Brace expansion
Process substitution
[[ ]]
echo -e inconsistent inconsistent
source ❌ (. only)
mapfile
shopt

If your script uses Bash features, you MUST explicitly run:

1
2
script:
  - bash script.sh

Otherwise the runner will use /bin/sh.


10.3 Environment Injection

CI/CD systems inject environment variables:

  • repository metadata
  • commit SHA
  • branch name
  • pipeline ID
  • job ID
  • secrets
  • tokens
  • paths

But:

  • values may contain spaces
  • values may contain newlines
  • values may contain JSON
  • values may contain YAML
  • values may contain special characters

This breaks:

  • unquoted expansions
  • word splitting
  • command substitution
  • globbing

Example:

1
echo "$GITHUB_EVENT_PATH"

May contain JSON with newlines → must be quoted.


10.4 Secret Handling

Secrets may contain:

  • newlines
  • spaces
  • quotes
  • backslashes
  • base64
  • PEM blocks

Never do:

1
echo $SECRET

Always:

1
printf '%s\n' "$SECRET"

10.5 Filesystem Semantics

CI/CD filesystems are:

  • ephemeral
  • isolated
  • often mounted as tmpfs
  • sometimes read‑only
  • sometimes container overlay layers

This affects:

  • caching
  • artifact storage
  • temporary files
  • cleanup
  • concurrency

11. Advanced Error Semantics

11.1 set -e is NOT enough

set -e does NOT trigger on:

  • if cmd
  • cmd || true
  • cmd | other
  • cmd &
  • [[ ]] tests
  • assignments
  • subshells

This is why CI/CD scripts MUST use:

1
set -euo pipefail

11.2 pipefail is mandatory

Without pipefail:

1
cmd1 | cmd2 | cmd3

If cmd1 fails but cmd3 succeeds → pipeline exit = 0.

This is the #1 cause of silent CI failures.


11.3 Subshells hide failures

1
( dangerous_command )

If it fails → parent shell continues.


11.4 Command substitution hides failures

1
x=$(dangerous_command)

If dangerous_command fails → exit code is lost.


11.5 Background jobs hide failures

1
cmd &

Exit code is detached.


12. Advanced Pipelines in CI/CD

Pipelines in CI/CD behave differently than interactive shells.


12.1 Each pipeline segment runs in a subshell

1
2
3
4
echo "a b c" | while read x; do
  count=$((count+1))
done
echo "$count"   # always 0

12.2 Pipelines break variable propagation

Never do:

1
2
3
echo "$data" | while read line; do
  result="$line"
done

Do:

1
2
3
4
5
while read line; do
  result="$line"
done <<EOF
$data
EOF

12.3 Pipelines break set -e

Example:

1
2
set -e
cmd1 | cmd2

If cmd1 fails → pipeline continues.


12.4 Pipelines break trap ERR

trap ERR does NOT fire inside pipelines unless:

1
set -o errtrace

13. Advanced Subshell Behavior

Subshells are everywhere in CI/CD:

  • pipelines
  • command substitution
  • process substitution
  • grouping ( )
  • background jobs
  • subshells inside Docker entrypoints

Subshells:

  • isolate variables
  • isolate traps
  • isolate set -e
  • isolate set -u
  • isolate pipefail
  • isolate working directory

13.1 Subshells break stateful logic

1
2
3
4
5
count=0
echo "a b c" | while read _; do
  count=$((count+1))
done
echo "$count"   # 0

13.2 Subshells break cleanup

1
2
3
4
5
trap 'cleanup' EXIT
(
  dangerous
)
# cleanup not triggered inside subshell

13.3 Subshells break environment mutation

1
2
3
export VAR=1
( export VAR=2 )
echo "$VAR"   # 1

14. Advanced CI/CD Logging

Logging must be:

  • structured
  • timestamped
  • machine‑readable
  • stderr‑based
  • consistent

14.1 Use stderr for logs

1
log() { printf '[%s] %s\n' "$(date +%H:%M:%S)" "$*" >&2; }

14.2 Use JSON for machine‑readable logs

1
printf '{"level":"info","msg":"%s"}\n' "$msg"

14.3 Avoid echo

echo is:

  • non‑portable
  • inconsistent
  • locale‑dependent
  • escapes vary

Use printf.


15. Advanced CI/CD Environment Validation

15.1 Validate required variables

1
2
3
4
: "${GIT_SHA:?missing}"
: "${ENVIRONMENT:?missing}"
: "${REGION:?missing}"
: "${SERVICE:?missing}"

15.2 Validate required tools

1
2
3
4
for tool in docker kubectl jq; do
  command -v "$tool" >/dev/null 2>&1 ||
    { printf '%s not installed\n' "$tool" >&2; exit 1; }
done

15.3 Validate required directories

1
2
[ -d "$WORKSPACE" ] ||
  { printf 'missing WORKSPACE\n' >&2; exit 1; }

15.4 Validate required files

1
2
[ -f deploy.yaml ] ||
  { printf 'missing deploy.yaml\n' >&2; exit 1; }

16. Advanced CI/CD Secrets Handling

Secrets may contain:

  • newlines
  • spaces
  • quotes
  • JSON
  • YAML
  • PEM blocks
  • base64

Never:

1
echo $SECRET

Always:

1
printf '%s\n' "$SECRET"

17. Advanced CI/CD Concurrency

CI/CD systems run jobs:

  • in parallel
  • on different machines
  • in different containers
  • with different environments

This creates:

  • race conditions
  • inconsistent state
  • conflicting deployments
  • corrupted caches
  • partial artifacts

17.1 Use file locks

1
flock /tmp/lockfile -c "critical_section"

17.2 Use atomic writes

1
2
3
tmp=$(mktemp)
generate > "$tmp"
mv "$tmp" output.json

17.3 Use retries with backoff

1
2
3
4
for i in 1 2 3; do
  cmd && break
  sleep $((i*i))
done

18. Advanced CI/CD Caching

Caching in CI/CD is fundamentally different from caching on a developer machine. It is:

  • non‑deterministic — runners differ between jobs
  • inconsistent across machines — different OS images, different shells
  • invalidated by timestamps — filesystem metadata changes unpredictably
  • invalidated by permissions — UID/GID differ between runners
  • invalidated by environment differences — locale, PATH, tool versions
  • sensitive to container layers — overlayfs behaves differently
  • sensitive to concurrency — parallel jobs corrupt shared caches

Caching must be explicit, content‑addressed, validated, and reproducible.


18.1 Use content‑addressable caching

The only reliable cache key is a hash of the inputs.

Example:

1
key=$(sha256sum requirements.txt | cut -d' ' -f1)

This ensures:

  • cache invalidates when inputs change
  • cache persists when inputs are identical
  • cache is independent of timestamps
  • cache is independent of runner environment

Content‑addressable caching is the only deterministic strategy.


18.2 Avoid timestamp‑based caching

Timestamp‑based caching fails in CI/CD because:

  • files are restored with fresh timestamps
  • container layers rewrite timestamps
  • Git checkouts normalize timestamps
  • artifact downloads rewrite timestamps
  • runners may use different timezones
  • clocks may drift between machines

Example of a broken pattern:

1
2
3
if [ deploy.yaml -nt deploy.cache ]; then
  regenerate_cache
fi

This fails because:

  • deploy.yaml may appear “newer” even if unchanged
  • deploy.cache may appear “older” after restore
  • overlayfs may rewrite mtime on extraction

Never rely on -nt, -ot, or mtime in CI/CD.


18.3 Avoid relying on filesystem metadata

Filesystem metadata is not stable across CI/CD runners:

  • permissions differ
  • ownership differs
  • inode numbers differ
  • mtime/ctime differ
  • overlayfs rewrites metadata
  • Docker COPY normalizes metadata
  • Git checkouts normalize metadata

Broken pattern:

1
2
3
if [ "$(stat -c %Y build/)" -eq "$LAST_BUILD_TS" ]; then
  echo "Cache valid"
fi

This fails because:

  • %Y (mtime) changes on extraction
  • %U (owner) differs between runners
  • %G (group) differs
  • %i (inode) differs
  • %n (link count) differs

Metadata is not portable. Never use it for caching logic.


18.4 Always validate cache contents

Caches must be validated, not trusted.

Example:

1
2
3
4
5
6
7
8
if [ -f cache.tar.gz ]; then
  if tar -tzf cache.tar.gz >/dev/null 2>&1; then
    echo "Cache OK"
  else
    echo "Cache corrupted, rebuilding"
    rm -f cache.tar.gz
  fi
fi

Validation strategies:

  • hash validation — store a checksum next to the cache
  • structural validation — ensure required files exist
  • semantic validation — ensure versions match
  • toolchain validation — ensure compiler/interpreter versions match

Example semantic validation:

1
2
3
4
5
6
7
cached_version=$(cat .cache/python-version 2>/dev/null || echo none)
current_version=$(python3 --version)

if [ "$cached_version" != "$current_version" ]; then
  echo "Python version changed, invalidating cache"
  rm -rf .cache
fi

Example structural validation:

1
2
3
4
5
6
7
8
required_files=(
  ".cache/venv/bin/python"
  ".cache/venv/lib/python3.11/site-packages"
)

for f in "${required_files[@]}"; do
  [ -e "$f" ] || { echo "Cache incomplete"; rm -rf .cache; break; }
done

Never assume a cache is valid just because it exists.

19. Advanced Real‑World Failures

Real CI/CD failures are rarely caused by “syntax errors”. They come from:

  • subshell isolation
  • hidden exit codes
  • race conditions
  • corrupted caches
  • broken pipelines
  • missing environment
  • inconsistent runners
  • zombie processes
  • PID1 behavior in containers
  • newline stripping
  • word splitting
  • globbing
  • tool version drift

Poniżej — pełna lista realnych incydentów produkcyjnych, z analizą przyczyn i poprawkami.


19.1 Failure: Pipeline passes despite failing tests

Symptom

1
npm test | tee test.log

npm test fails → tee succeeds → pipeline exit = 0.

Root cause

  • pipeline exit code = exit code of last command
  • tee always exits 0
  • set -e does NOT fix this
  • trap ERR does NOT fire inside pipelines

Fix

1
2
set -o pipefail
npm test | tee test.log

19.2 Failure: Deployment script deploys to production accidentally

Symptom

1
kubectl apply -f deploy.yaml

Root cause

  • missing environment validation
  • $ENV empty or wrong
  • CI runner uses default kubeconfig
  • deployment goes to wrong cluster

Fix

1
2
: "${ENVIRONMENT:?missing ENVIRONMENT}"
kubectl --context="cluster-$ENVIRONMENT" apply -f deploy.yaml

19.3 Failure: CI job hangs due to background process

Symptom

1
2
long_task &
exit 0

Root cause

  • background job keeps running
  • runner waits for all processes
  • job never terminates

Fix

1
trap 'kill 0' EXIT

19.4 Failure: Multi‑line secrets break scripts

Symptom

1
echo "$SECRET"

SECRET contains:

  • newlines
  • quotes
  • PEM blocks

Output becomes corrupted.

Fix

1
printf '%s\n' "$SECRET"

19.5 Failure: JSON corrupted by command substitution

Symptom

1
payload=$(cat payload.json)

Command substitution strips trailing newlines → JSON invalid.

Fix

1
2
payload=$(cat payload.json; printf x)
payload=${payload%x}

lub:

1
payload=$(<payload.json)

19.6 Failure: Race condition between parallel CI jobs

Symptom

Two jobs write to same cache directory.

Root cause

  • no locking
  • no atomic writes
  • shared filesystem

Fix

1
flock /tmp/cache.lock -c "critical_section"

19.7 Failure: Wrong tool version used

Symptom

python3 is different on different runners.

Fix

Validate:

1
python3 --version

Fail if mismatch.


19.8 Failure: Docker build uses stale cache

Symptom

Docker layer cache invalidated unpredictably.

Fix

Use content‑addressable cache keys.


20. Advanced Debugging Techniques

Debugging CI/CD scripts requires visibility, not guesswork.


20.1 Enable debug mode conditionally

1
2
DEBUG=${DEBUG:-0}
((DEBUG)) && set -x

20.2 Use PS4 for deep tracing

1
2
export PS4='+ ${BASH_SOURCE}:${LINENO}:${FUNCNAME[0]}: '
set -x

Shows:

  • file
  • line
  • function
  • expanded command

20.3 Trace syscalls

1
strace -f -e trace=process,desc sh script.sh

Useful for:

  • FD leaks
  • zombie processes
  • missing files
  • permission issues

20.4 Trace signals

1
strace -f -e trace=signal -p "$PID"

Useful for:

  • SIGTERM not delivered
  • PID1 ignoring signals
  • stuck containers

20.5 Inspect environment snapshot

1
env | sort >&2

Useful for:

  • missing variables
  • wrong values
  • overwritten secrets

20.6 Inspect file descriptors

1
ls -l /proc/$$/fd

Useful for:

  • leaked pipes
  • leaked sockets
  • broken redirections

20.7 Inspect subshell boundaries

Add:

1
echo "PID=$$"

Useful for:

  • pipeline subshells
  • command substitution subshells
  • Docker entrypoint subshells

21. Advanced Performance Engineering

CI/CD performance is not about CPU — it’s about:

  • avoiding unnecessary forks
  • avoiding unnecessary clones
  • avoiding unnecessary downloads
  • avoiding unnecessary rebuilds
  • avoiding unnecessary pipelines
  • avoiding unnecessary subshells

21.1 Avoid slow loops

Bad:

1
2
3
for f in $(ls); do
  process "$f"
done

Good:

1
find . -type f -print0 | xargs -0 -n1 process

21.2 Avoid unnecessary cloning

Use shallow clones:

1
git clone --depth=1

21.3 Avoid unnecessary scanning

Cache dependency graphs.


21.4 Avoid unnecessary forks

Bad:

1
2
3
for f in *; do
  echo "$f"
done

Good:

1
printf '%s\n' *

21.5 Use streaming tools

  • jq
  • awk
  • sed
  • grep

These avoid loading entire files into memory.


22. Advanced Process Control in CI/CD

Process control is critical in CI/CD because:

  • runners send SIGTERM on timeout
  • containers use PID1 semantics
  • background jobs cause hangs
  • subshells isolate traps
  • pipelines break error propagation

22.1 Always handle SIGTERM

1
trap 'cleanup; exit 0' SIGTERM

22.2 Always reap children

1
trap 'wait' CHLD

22.3 Avoid background jobs unless necessary

Background jobs:

  • hide exit codes
  • break determinism
  • cause hangs
  • cause zombie accumulation

22.4 Use exec when launching long‑running processes

1
exec app "$@"

Ensures:

  • correct signal forwarding
  • no zombie shell
  • no double PID tree

23. Advanced Signals in CI/CD

CI/CD runners send:

  • SIGTERM → graceful shutdown
  • SIGKILL → forced shutdown
  • SIGINT → manual cancellation

Containers often ignore SIGTERM unless:

  • PID1 forwards signals
  • entrypoint uses exec
  • init system (tini/dumb-init) is used

23.1 Detecting signal handling issues

1
strace -f -e trace=signal -p 1

23.2 Forwarding signals manually

1
2
3
trap 'kill -TERM "$child"' TERM
child=$!
wait "$child"

24. Advanced Containers Interactions

CI/CD often runs inside containers. This introduces:

  • PID1 semantics
  • zombie reaping
  • signal forwarding
  • overlayfs behavior
  • missing environment
  • missing PATH
  • missing locale

24.1 Always use exec in entrypoints

1
exec app "$@"

24.2 Use tini or dumb-init

1
ENTRYPOINT ["/usr/bin/tini", "--"]

24.3 Validate environment inside container

1
: "${CONFIG_PATH:?missing}"

24.4 Avoid long‑running loops

Containers must exit cleanly.


25. Advanced CI/CD Lifecycle

A CI/CD job has phases:

  1. environment injection
  2. workspace preparation
  3. dependency installation
  4. build
  5. test
  6. packaging
  7. deployment
  8. cleanup

Each phase must be:

  • deterministic
  • idempotent
  • validated
  • logged
  • fail‑fast

26. Full Patterns

26.1 Pattern: One script per responsibility

26.2 Pattern: Idempotent scripts

26.3 Pattern: Explicit inputs, explicit outputs

26.4 Pattern: JSON for structured output

26.5 Pattern: Fail fast, fail loud

26.6 Pattern: Validate everything

26.7 Pattern: Use exec for long‑running processes

26.8 Pattern: Use content‑addressable caching

26.9 Pattern: Use atomic writes

26.10 Pattern: Use file locks


27. Full Anti‑Patterns

  • Using echo for structured data
  • Relying on implicit PATH
  • Running interactive commands
  • Silent failure swallowing
  • Using pipelines for state mutation
  • Using timestamp‑based caching
  • Using metadata‑based caching
  • Using background jobs without cleanup
  • Using shell as PID1
  • Using tail -f as main process

28. Summary

CI/CD shell scripting requires:

  • strict mode
  • deterministic behavior
  • explicit validation
  • predictable exit codes
  • safe pipelines
  • structured logging
  • idempotency
  • no interactive assumptions
  • correct signal handling
  • correct process control
  • correct subshell behavior
  • correct caching
  • correct environment validation
  • correct container semantics

Mastering these techniques produces reliable, reproducible, production‑grade pipelines.