๐ณ Advanced Shell in Containers
๐ง Overview
Shell scripts inside containers behave differently than on a normal Linux host. Containers change:
- PID hierarchy
- signal delivery
- process groups
- environment initialization
- filesystem layout
- entrypoint semantics
- zombie reaping
- logging and stdout/stderr behavior
This module explains how to write robust, productionโgrade shell scripts that run correctly inside Docker, Kubernetes, Nomad, ECS, Swarm, and containerized CI/CD environments.
๐ Who this is for
- DevOps/SRE building container images or entrypoints
- Engineers writing startup scripts, health checks, or lifecycle hooks
- Anyone debugging signal handling, zombie processes, or shutdown issues
- People deploying applications in Kubernetes, Nomad, ECS, or Docker Swarm
- Engineers writing Docker entrypoints, sidecars, init containers, or CI containers
๐งฉ Role in the Ecosystem
Container shell behavior interacts with:
- Process Control
- Subshells & Environment
- Advanced Pipelines
- Advanced Error Handling
- Shell in CI/CD
- POSIX Shell Compatibility
Containers are where PID 1 semantics, signals, and zombies become very real.
1. Internals / Mechanics
1.1 PID 1 is special
Inside a container, the entrypoint becomes PID 1.
PID 1 has unique semantics:
- ignores some signals by default
- does not automatically reap zombie processes
- does not forward signals unless explicitly implemented
- is responsible for clean shutdown
- is the parent of all processes in the container
This is the root cause of many container bugs:
- pods stuck in
Terminating - containers ignoring SIGTERM
- zombie processes accumulating
- CI jobs hanging forever
1.2 Shell as PID 1 is dangerous
If your entrypoint is:
1 | |
Then sh becomes PID 1 and:
- does not reap children
- may ignore SIGTERM
- may not forward signals to the app
- may leave zombies
- may cause slow or broken shutdowns
- may swallow exit codes
This is especially bad when:
- the shell spawns background jobs
- the shell uses pipelines
- the shell uses subshells
- the shell does not use
exec
1.3 Exec vs nonโexec entrypoints
Bad:
1 | |
Good:
1 | |
exec replaces the shell with the application:
- no extra shell process
- correct signal handling
- no zombie shell
- PID 1 is the actual app
- simpler process tree
Without exec, you get:
- PID 1 = shell
- app is child of shell
- SIGTERM goes to shell, not app
- shell may ignore or mishandle signals
1.4 Environment initialization in containers
Containers often run with:
- missing environment variables
- empty or minimal PATH
- no user dotfiles
- minimal locale settings
- no login shell semantics
This means:
- no
.bashrc,.profile,.zshrc - no aliases
- no functions
- no PATH modifications
- no locale configuration
Scripts must validate everything:
- required variables
- required tools
- required directories
- required files
1.5 Filesystem and PID namespace
Containers run with:
- isolated PID namespace
- isolated mount namespace
- overlayfs or unionfs
- ephemeral writable layers
This affects:
- process inspection (
ps,/proc) - file descriptors (
/proc/1/fd) - zombie visibility
- performance of heavy I/O loops
2. Techniques
2.1 Always use exec in entrypoints
Your entrypoint should almost always end with exec:
1 2 3 4 5 6 7 | |
Benefits:
- PID 1 =
app - signals go directly to the app
- no extra shell layer
- no zombie shell
- simpler debugging
2.2 Use a minimal init system
PID 1 must:
- reap zombies
- forward signals
- exit cleanly
Shells are not init systems. Use:
tinidumb-init
Example:
1 2 3 | |
Then in run.sh:
1 2 3 | |
tini:
- reaps zombies
- forwards signals
- handles PID 1 semantics correctly
2.3 Validate environment variables before use
1 2 3 | |
If missing โ container exits fast and loud, instead of failing later in weird ways.
2.4 Use predictable logging
1 | |
In containers:
- stdout/stderr are the primary logging channels
- logs are collected by Docker, Kubernetes, ECS, etc.
- structured logs are easier to parse
2.5 Use trap for graceful shutdown
1 2 3 4 5 6 | |
This ensures:
- graceful shutdown on
docker stop - graceful shutdown on Kubernetes SIGTERM
- graceful shutdown on CI timeouts (if signals are forwarded)
3. Pitfalls
3.1 Shell as PID 1 not reaping zombies
Example:
1 2 3 | |
workerexits โ becomes zombie- PID 1 (shell) does not reap it
- zombies accumulate
3.2 SIGTERM not stopping the app
Entrypoint:
1 2 3 | |
Kubernetes sends SIGTERM to PID 1 (shell):
- shell may ignore SIGTERM
- app keeps running
- pod stuck in
Terminating
3.3 Using sleep infinity in entrypoints
1 2 3 | |
- container never exits
- SIGTERM may be ignored
- shutdown is broken
3.4 Relying on interactive features
Containers do not load:
.bashrc.profile.zshrc
No:
- aliases
- prompts
- interactive
read - interactive
sudo
3.5 Using tail -f as the main process
1 | |
- PID 1 =
tail - app is separate process
- signals go to
tail, not app - shutdown is broken
4. RealโWorld Failures (Intro)
Poniลผej โ konkretne incydenty z produkcji, ktรณre wynikajฤ z:
- PID 1 semantics
- braku
exec - braku reaping
- zลego trapowania sygnaลรณw
- zลego entrypoint design
Szczegรณลowe przypadki rozwinฤ w czฤลci 2/3.
4.1 Kubernetes pod refuses to terminate
Entrypoint:
1 2 3 | |
SIGTERM sent โ shell ignores โ pod stuck in Terminating.
4.2 Zombie processes accumulate in container
Longโrunning script spawns children but never reaps them.
4.3 Application never receives SIGTERM
Entrypoint:
1 | |
Shell remains PID 1 โ app is child โ SIGTERM goes to shell, not app.
5. Patterns (Intro)
- Use
execto replace the shell - Use a real init system (
tini,dumb-init) - Validate environment early
- Use lightweight health checks
- Keep entrypoints small and deterministic
Peลne rozwiniฤcie patterns/antiโpatterns/debugging/performance bฤdzie w czฤลciach 2/3 i 3/3.