Przejdลบ do treล›ci

๐Ÿณ Advanced Shell in Containers

๐Ÿง  Overview

Shell scripts inside containers behave differently than on a normal Linux host. Containers change:

  • PID hierarchy
  • signal delivery
  • process groups
  • environment initialization
  • filesystem layout
  • entrypoint semantics
  • zombie reaping
  • logging and stdout/stderr behavior

This module explains how to write robust, productionโ€‘grade shell scripts that run correctly inside Docker, Kubernetes, Nomad, ECS, Swarm, and containerized CI/CD environments.


๐ŸŽ“ Who this is for

  • DevOps/SRE building container images or entrypoints
  • Engineers writing startup scripts, health checks, or lifecycle hooks
  • Anyone debugging signal handling, zombie processes, or shutdown issues
  • People deploying applications in Kubernetes, Nomad, ECS, or Docker Swarm
  • Engineers writing Docker entrypoints, sidecars, init containers, or CI containers

๐Ÿงฉ Role in the Ecosystem

Container shell behavior interacts with:

Containers are where PID 1 semantics, signals, and zombies become very real.


1. Internals / Mechanics

1.1 PID 1 is special

Inside a container, the entrypoint becomes PID 1.

PID 1 has unique semantics:

  • ignores some signals by default
  • does not automatically reap zombie processes
  • does not forward signals unless explicitly implemented
  • is responsible for clean shutdown
  • is the parent of all processes in the container

This is the root cause of many container bugs:

  • pods stuck in Terminating
  • containers ignoring SIGTERM
  • zombie processes accumulating
  • CI jobs hanging forever

1.2 Shell as PID 1 is dangerous

If your entrypoint is:

1
CMD ["sh", "-c", "run.sh"]

Then sh becomes PID 1 and:

  • does not reap children
  • may ignore SIGTERM
  • may not forward signals to the app
  • may leave zombies
  • may cause slow or broken shutdowns
  • may swallow exit codes

This is especially bad when:

  • the shell spawns background jobs
  • the shell uses pipelines
  • the shell uses subshells
  • the shell does not use exec

1.3 Exec vs nonโ€‘exec entrypoints

Bad:

1
app "$@"

Good:

1
exec app "$@"

exec replaces the shell with the application:

  • no extra shell process
  • correct signal handling
  • no zombie shell
  • PID 1 is the actual app
  • simpler process tree

Without exec, you get:

  • PID 1 = shell
  • app is child of shell
  • SIGTERM goes to shell, not app
  • shell may ignore or mishandle signals

1.4 Environment initialization in containers

Containers often run with:

  • missing environment variables
  • empty or minimal PATH
  • no user dotfiles
  • minimal locale settings
  • no login shell semantics

This means:

  • no .bashrc, .profile, .zshrc
  • no aliases
  • no functions
  • no PATH modifications
  • no locale configuration

Scripts must validate everything:

  • required variables
  • required tools
  • required directories
  • required files

1.5 Filesystem and PID namespace

Containers run with:

  • isolated PID namespace
  • isolated mount namespace
  • overlayfs or unionfs
  • ephemeral writable layers

This affects:

  • process inspection (ps, /proc)
  • file descriptors (/proc/1/fd)
  • zombie visibility
  • performance of heavy I/O loops

2. Techniques

2.1 Always use exec in entrypoints

Your entrypoint should almost always end with exec:

1
2
3
4
5
6
7
#!/bin/sh
set -e

# optional: setup, logging, validation
: "${CONFIG_PATH:?CONFIG_PATH must be set}"

exec app "$@"

Benefits:

  • PID 1 = app
  • signals go directly to the app
  • no extra shell layer
  • no zombie shell
  • simpler debugging

2.2 Use a minimal init system

PID 1 must:

  • reap zombies
  • forward signals
  • exit cleanly

Shells are not init systems. Use:

  • tini
  • dumb-init

Example:

1
2
3
RUN apk add --no-cache tini
ENTRYPOINT ["/sbin/tini", "--"]
CMD ["run.sh"]

Then in run.sh:

1
2
3
#!/bin/sh
set -e
exec app "$@"

tini:

  • reaps zombies
  • forwards signals
  • handles PID 1 semantics correctly

2.3 Validate environment variables before use

1
2
3
: "${CONFIG_PATH:?CONFIG_PATH must be set}"
: "${ENVIRONMENT:?ENVIRONMENT must be set}"
: "${SERVICE:?SERVICE must be set}"

If missing โ†’ container exits fast and loud, instead of failing later in weird ways.


2.4 Use predictable logging

1
log() { printf '[%s] %s\n' "$(date +%H:%M:%S)" "$*" >&2; }

In containers:

  • stdout/stderr are the primary logging channels
  • logs are collected by Docker, Kubernetes, ECS, etc.
  • structured logs are easier to parse

2.5 Use trap for graceful shutdown

1
2
3
4
5
6
cleanup() {
  log "Cleaning up..."
  # stop children, flush buffers, remove temp files, etc.
}

trap 'cleanup; exit 0' SIGTERM SIGINT

This ensures:

  • graceful shutdown on docker stop
  • graceful shutdown on Kubernetes SIGTERM
  • graceful shutdown on CI timeouts (if signals are forwarded)

3. Pitfalls

3.1 Shell as PID 1 not reaping zombies

Example:

1
2
3
#!/bin/sh
worker &
sleep infinity
  • worker exits โ†’ becomes zombie
  • PID 1 (shell) does not reap it
  • zombies accumulate

3.2 SIGTERM not stopping the app

Entrypoint:

1
2
3
#!/bin/sh
app &
wait

Kubernetes sends SIGTERM to PID 1 (shell):

  • shell may ignore SIGTERM
  • app keeps running
  • pod stuck in Terminating

3.3 Using sleep infinity in entrypoints

1
2
3
#!/bin/sh
setup
sleep infinity
  • container never exits
  • SIGTERM may be ignored
  • shutdown is broken

3.4 Relying on interactive features

Containers do not load:

  • .bashrc
  • .profile
  • .zshrc

No:

  • aliases
  • prompts
  • interactive read
  • interactive sudo

3.5 Using tail -f as the main process

1
tail -f /var/log/app.log
  • PID 1 = tail
  • app is separate process
  • signals go to tail, not app
  • shutdown is broken

4. Realโ€‘World Failures (Intro)

Poniลผej โ€” konkretne incydenty z produkcji, ktรณre wynikajฤ… z:

  • PID 1 semantics
  • braku exec
  • braku reaping
  • zล‚ego trapowania sygnaล‚รณw
  • zล‚ego entrypoint design

Szczegรณล‚owe przypadki rozwinฤ™ w czฤ™ล›ci 2/3.


4.1 Kubernetes pod refuses to terminate

Entrypoint:

1
2
3
#!/bin/sh
app &
wait

SIGTERM sent โ†’ shell ignores โ†’ pod stuck in Terminating.


4.2 Zombie processes accumulate in container

Longโ€‘running script spawns children but never reaps them.


4.3 Application never receives SIGTERM

Entrypoint:

1
app "$@"

Shell remains PID 1 โ†’ app is child โ†’ SIGTERM goes to shell, not app.


5. Patterns (Intro)

  • Use exec to replace the shell
  • Use a real init system (tini, dumb-init)
  • Validate environment early
  • Use lightweight health checks
  • Keep entrypoints small and deterministic

Peล‚ne rozwiniฤ™cie patterns/antiโ€‘patterns/debugging/performance bฤ™dzie w czฤ™ล›ciach 2/3 i 3/3.