🧾 Extended Text Processing

Efficient text parsing is essential for log analysis, config management, and data transformation.

🧭 Core Tools

Tool	Purpose	Portability
`grep`	Pattern matching	✅ POSIX
`sed`	Stream editor	✅ POSIX
`awk`	Column-based processing	✅ POSIX
`cut`	Extract columns	✅ POSIX
`sort`	Sort lines	✅ POSIX
`uniq`	Remove duplicates	✅ POSIX
`tr`	Character translation	✅ POSIX
`jq`	JSON processing	❌ External

🧪 grep — Pattern Matching

Basic Usage

grep ERROR logfile.txt
grep -i warning messages.log      # Case-insensitive
grep -v DEBUG logfile.txt         # Invert match (exclude DEBUG)
grep -E "error|warning" log.txt   # Extended regex

Useful Flags

Flag	Purpose
`-i`	Case-insensitive
`-v`	Invert match
`-n`	Show line numbers
`-c`	Count matches
`-l`	List files with matches
`-r`	Recursive search
`-E`	Extended regex
`-o`	Show only matching part
`-A N`	Show N lines after match
`-B N`	Show N lines before match
`-C N`	Show N lines context

Practical Examples

Count errors per file:

grep -c ERROR *.log

Show context around matches:

grep -C 3 "Exception" app.log

Multiple patterns:

grep -E "ERROR|WARNING|CRITICAL" system.log

🧪 sed — Stream Editor

Basic Substitution

sed 's/old/new/' file.txt           # First occurrence per line
sed 's/old/new/g' file.txt          # All occurrences
sed 's/old/new/2' file.txt          # Second occurrence only

In-Place Editing

sed -i 's/old/new/g' file.txt       # Modify file directly
sed -i.bak 's/old/new/g' file.txt   # Create backup

Delete Lines

sed '/pattern/d' file.txt           # Delete matching lines
sed '1,10d' file.txt                # Delete lines 1-10
sed '$d' file.txt                   # Delete last line

Insert and Append

sed '1i\Header line' file.txt       # Insert before line 1
sed '$a\Footer line' file.txt       # Append after last line

🧪 awk — Column Processing

Print Columns

awk '{print $1, $3}' data.txt       # Print columns 1 and 3
awk -F',' '{print $2}' csv.csv      # CSV with comma delimiter

Filtering

awk '$3 > 100' sales.csv            # Rows where col 3 > 100
awk '/ERROR/ {print $0}' log.txt    # Lines containing ERROR

Calculations

awk '{sum += $1} END {print "Total:", sum}' numbers.txt
awk '{count++} END {print "Lines:", count}' file.txt

Formatting

awk '{printf "%-10s %5d\n", $1, $2}' data.txt

🧪 cut — Simple Column Extraction

cut -d',' -f1,3 data.csv            # Columns 1 and 3
cut -c1-10 file.txt                 # Characters 1-10
cut -f2 -d' ' names.txt             # Second field (space delimiter)

🧠 Combining Tools

Classic Pipeline

cat access.log \
  | grep "POST /api" \
  | awk '{print $1}' \
  | sort \
  | uniq -c \
  | sort -nr \
  | head -10

This finds top 10 IPs hitting POST /api.

Log Analysis Example

# Count status codes
awk '{print $9}' access.log | sort | uniq -c | sort -nr

# Find slowest requests
awk '$NF > 1.0 {print $0}' access.log | sort -k10 -nr

🧪 jq — JSON Processing

Basic Usage

echo '{"name":"Alice","age":30}' | jq '.name'
# Output: "Alice"

echo '{"users":[{"name":"Bob"},{"name":"Carol"}]}' | jq '.users[].name'
# Output: "Bob" "Carol"

Filtering

jq '.[] | select(.age > 18)' users.json
jq '.users[] | select(.active == true)' data.json

Formatting

jq -c '.' messy.json               # Compact output
jq '. | {name, age}' user.json      # Select fields

🧾 Performance Tips

Pattern	Performance	Notes
`grep \\| sed \\| awk`	⚠️ Slow	Multiple processes
`awk` alone	✅ Fast	Single process
`grep -o`	✅ Fast	Built-in extraction
`sed -i`	⚠️ Slow	Rewrites entire file
`jq`	✅ Fast	Optimized for JSON

🧾 Summary

Master grep, sed, awk, cut, sort, uniq.
Combine tools in pipelines for complex tasks.
Use awk when possible — it's faster than multiple tools.
jq is essential for JSON processing.
Always quote variables to prevent word splitting.

👉 Continue to: Portability Patterns