🧵 Advanced Pipelines
🧠 Overview
This module goes deep into how pipelines really work:
- how the shell builds the process + FD graph
- how exit codes are chosen (
pipefail, last command, etc.) - how buffering and backpressure work
- how SIGINT/SIGPIPE propagate
- why some pipelines hang “randomly”
- how to design production‑grade pipelines for CI/CD and data processing
The goal: when you see cmd1 | cmd2 | cmd3, you don’t think “three commands in a row”, you think “three processes + two pipes + specific signal/exit semantics”.
🎓 Who this is for
- DevOps/SRE building data pipelines, log processing, or CI chains.
- Engineers who rely on
grep | awk | sed | jq | ...in critical scripts. - Anyone who has ever seen:
- a pipeline hang forever,
- a partial result,
- or a “broken pipe” at the wrong place.
You should already understand:
- basic shell scripting
- what
fork/execdo (see: Execve & Fork Internals) - basic process control (see: Advanced Process Control)
🧩 Internals / Mechanics
🧩 How the shell builds a pipeline
For:
1 | |
The shell typically:
- Creates N‑1 pipes (here: 2 pipes).
- Forks N children (here: 3 processes).
- In each child:
- wires stdin/stdout via
dup2()to the appropriate pipe ends, - closes unused FDs,
execve()s the target command.- In the parent:
- closes all pipe FDs,
- tracks PIDs,
- waits according to its pipeline semantics.
Key point: each stage is its own process, with its own buffering, signals, and exit code.
🧩 Exit status of a pipeline
Default (POSIX‑ish, many shells):
1 2 | |
$?is the exit code of the last command (cmd3).
Bash with set -o pipefail:
$?is the first non‑zero exit code in the pipeline,- or
0if all succeeded.
This is critical in CI/CD:
- without
pipefail,cmd1can fail silently ifcmd3succeeds. - with
pipefail, the pipeline fails if any stage fails.
🧩 Buffering and backpressure
Pipes are bounded buffers (typically 64 KiB on many systems).
- If
cmd1writes faster thancmd2reads: - the pipe fills,
cmd1blocks on write,-
backpressure propagates upstream.
-
If
cmd2is slow or stuck: - the whole pipeline can appear “hung”.
Also:
- many tools (e.g.
grep,awk,python) buffer differently depending on whether stdout is a TTY or a pipe. - line buffering vs block buffering can change perceived latency.
🧩 SIGPIPE and early termination
If a downstream process exits early:
1 | |
consumerexits.producerwrites to a pipe with no reader.- kernel sends
SIGPIPEtoproducer. - default behavior:
producerterminates with exit code141(128 + 13).
This is normal, but can be surprising.
Example:
1 | |
headexits after 1 line.yesgets SIGPIPE and dies.
🧩 Pipelines and process groups
In interactive shells:
- the whole pipeline is usually placed in one process group.
- Ctrl‑C (SIGINT) goes to the foreground process group → all stages.
In non‑interactive scripts:
- job control may be disabled,
- but the shell still typically groups pipeline processes.
This matters for:
- signal propagation,
- clean shutdown,
- CI behavior.
See also: Advanced Process Control.
🔧 Techniques
🔧 Use set -o pipefail in non‑trivial pipelines
In scripts:
1 | |
This ensures:
- pipelines fail if any stage fails,
- not just the last one.
🔧 Make failure explicit in middle stages
Example:
1 | |
If build fails but deploy still runs, you’re in trouble.
Better:
1 2 3 | |
Or even:
1 2 | |
So that failure in build is clearly separated from deploy.
🔧 Use xargs / parallel instead of naive loops
Instead of:
1 2 3 | |
Consider:
1 | |
Architecturally:
- you move from “shell‑driven loop” to “data‑driven worker pool”.
- but you must understand how exit codes propagate (xargs has its own semantics).
🔧 Use tee to branch pipelines
To both log and process:
1 | |
Or to split:
1 | |
(implementation‑dependent; process substitution may spawn subshells).
⚠️ Pitfalls
⚠️ Silent failures in early stages
1 | |
If generate fails but upload exits 0, you might:
- upload partial data,
- or nothing at all, but still “succeed”.
Without pipefail, $? only reflects upload.
⚠️ Hanging pipelines due to open FDs
If any process keeps a write end of a pipe open:
- readers never see EOF,
- pipeline appears hung.
Common causes:
- parent shell not closing pipe FDs,
- extra processes inheriting FDs (no
CLOEXEC), - tools that fork internally and keep FDs open.
⚠️ Mixing TTY‑dependent behavior
Some tools:
- behave differently when stdout is a TTY vs a pipe,
- change buffering,
- change formatting (colors, progress bars).
This can break scripts when moved from interactive use to CI.
⚠️ Over‑pipelining
Deep chains like:
1 | |
are:
- harder to debug,
- more fragile,
- more sensitive to buffering and partial failures.
Sometimes a small script in Python/Go/Rust is clearer and safer.
🚨 Real‑world failures
🚨 Failure: CI pipeline “hangs randomly”
Scenario:
1 | |
uploaderexits early on error.consumerexits when uploader closes its input.producerkeeps writing, but:- some process still has a write end open,
- or SIGPIPE is ignored/handled badly.
Result: CI job hangs.
Root causes:
- FDs not closed properly.
- No
pipefail. - No explicit error handling.
🚨 Failure: Partial deploy with green status
1 | |
buildfails halfway.deployreads partial log, still exits 0.- CI marks job as success.
Fix:
set -o pipefail.- Or split stages:
1 2 | |
🚨 Failure: “Broken pipe” spam in logs
1 | |
headexits after 10 lines.producergets SIGPIPE, logs stack traces or errors.
Fix:
- treat SIGPIPE as normal in this context,
- or adjust logging to ignore it when expected.
🛠️ Patterns
🛠️ Pattern: Short, named pipelines
Instead of:
1 | |
Use:
1 2 3 4 5 6 7 8 9 | |
Benefits:
- easier to test,
- easier to reuse,
- easier to extend.
🛠️ Pattern: Validate inputs before pipelines
Before:
1 | |
Do:
1 2 3 4 | |
Architecturally: fail fast before building complex process graphs.
🛠️ Pattern: Use logs as first‑class artifacts
Instead of:
1 | |
Consider:
1 | |
So you can:
- debug failures post‑mortem,
- replay data through later stages.
❌ Anti‑patterns
- giant, unreadable one‑liner pipelines in critical scripts
- relying on default exit‑code semantics without
pipefail - ignoring SIGPIPE and treating it as “unexpected error”
- using pipelines where a small script would be clearer
- mixing interactive and non‑interactive assumptions (colors, prompts, paging)
🔍 Debugging
🔍 Trace processes and FDs
Use:
1 | |
to see:
- which processes are spawned,
- who reads/writes which FDs,
- where things block.
🔍 Inspect process tree live
1 2 | |
to see:
- which stages are still running,
- whether something is stuck upstream.
🔍 Check exit codes of all stages
In Bash:
1 2 3 | |
PIPESTATUS holds exit codes of each stage.
🧠 Summary
Advanced pipelines are not “just a bunch of commands with | between them”.
They are:
- process graphs (multiple PIDs),
- FD graphs (pipes, redirections),
- exit‑code semantics (last vs
pipefail), - buffering and backpressure,
- signal propagation (SIGINT, SIGPIPE).
Once you think of pipelines this way, you can design:
- non‑hanging,
- correctly failing,
- observable,
- production‑grade shell pipelines for CI/CD and data processing.