๐งพ Extended Text Processing
Efficient text parsing is essential for log analysis, config management, and data transformation.
| Tool |
Purpose |
Portability |
grep |
Pattern matching |
โ
POSIX |
sed |
Stream editor |
โ
POSIX |
awk |
Column-based processing |
โ
POSIX |
cut |
Extract columns |
โ
POSIX |
sort |
Sort lines |
โ
POSIX |
uniq |
Remove duplicates |
โ
POSIX |
tr |
Character translation |
โ
POSIX |
jq |
JSON processing |
โ External |
๐งช grep โ Pattern Matching
Basic Usage
| grep ERROR logfile.txt
grep -i warning messages.log # Case-insensitive
grep -v DEBUG logfile.txt # Invert match (exclude DEBUG)
grep -E "error|warning" log.txt # Extended regex
|
Useful Flags
| Flag |
Purpose |
-i |
Case-insensitive |
-v |
Invert match |
-n |
Show line numbers |
-c |
Count matches |
-l |
List files with matches |
-r |
Recursive search |
-E |
Extended regex |
-o |
Show only matching part |
-A N |
Show N lines after match |
-B N |
Show N lines before match |
-C N |
Show N lines context |
Practical Examples
Count errors per file:
Show context around matches:
| grep -C 3 "Exception" app.log
|
Multiple patterns:
| grep -E "ERROR|WARNING|CRITICAL" system.log
|
๐งช sed โ Stream Editor
Basic Substitution
| sed 's/old/new/' file.txt # First occurrence per line
sed 's/old/new/g' file.txt # All occurrences
sed 's/old/new/2' file.txt # Second occurrence only
|
In-Place Editing
| sed -i 's/old/new/g' file.txt # Modify file directly
sed -i.bak 's/old/new/g' file.txt # Create backup
|
Delete Lines
| sed '/pattern/d' file.txt # Delete matching lines
sed '1,10d' file.txt # Delete lines 1-10
sed '$d' file.txt # Delete last line
|
Insert and Append
| sed '1i\Header line' file.txt # Insert before line 1
sed '$a\Footer line' file.txt # Append after last line
|
๐งช awk โ Column Processing
Print Columns
| awk '{print $1, $3}' data.txt # Print columns 1 and 3
awk -F',' '{print $2}' csv.csv # CSV with comma delimiter
|
Filtering
| awk '$3 > 100' sales.csv # Rows where col 3 > 100
awk '/ERROR/ {print $0}' log.txt # Lines containing ERROR
|
Calculations
| awk '{sum += $1} END {print "Total:", sum}' numbers.txt
awk '{count++} END {print "Lines:", count}' file.txt
|
| awk '{printf "%-10s %5d\n", $1, $2}' data.txt
|
๐งช cut โ Simple Column Extraction
| cut -d',' -f1,3 data.csv # Columns 1 and 3
cut -c1-10 file.txt # Characters 1-10
cut -f2 -d' ' names.txt # Second field (space delimiter)
|
Classic Pipeline
| cat access.log \
| grep "POST /api" \
| awk '{print $1}' \
| sort \
| uniq -c \
| sort -nr \
| head -10
|
This finds top 10 IPs hitting POST /api.
Log Analysis Example
| # Count status codes
awk '{print $9}' access.log | sort | uniq -c | sort -nr
# Find slowest requests
awk '$NF > 1.0 {print $0}' access.log | sort -k10 -nr
|
๐งช jq โ JSON Processing
Basic Usage
| echo '{"name":"Alice","age":30}' | jq '.name'
# Output: "Alice"
echo '{"users":[{"name":"Bob"},{"name":"Carol"}]}' | jq '.users[].name'
# Output: "Bob" "Carol"
|
Filtering
| jq '.[] | select(.age > 18)' users.json
jq '.users[] | select(.active == true)' data.json
|
| jq -c '.' messy.json # Compact output
jq '. | {name, age}' user.json # Select fields
|
| Pattern |
Performance |
Notes |
grep \| sed \| awk |
โ ๏ธ Slow |
Multiple processes |
awk alone |
โ
Fast |
Single process |
grep -o |
โ
Fast |
Built-in extraction |
sed -i |
โ ๏ธ Slow |
Rewrites entire file |
jq |
โ
Fast |
Optimized for JSON |
๐งพ Summary
- Master
grep, sed, awk, cut, sort, uniq.
- Combine tools in pipelines for complex tasks.
- Use
awk when possible โ it's faster than multiple tools.
jq is essential for JSON processing.
- Always quote variables to prevent word splitting.
๐ Continue to: Portability Patterns