Przejdลบ do treล›ci

๐Ÿ‘๏ธ AI Shell: Hallucination Detection

A "hallucination" in the context of AI-assisted shell scripting occurs when the LLM invents non-existent flags, mixes up GNU and BSD toolsets, or uses outdated syntax. Detecting these hallucinations before execution is critical.


๐ŸŽฏ Common Shell Hallucinations

LLMs frequently hallucinate in specific, predictable areas. Be highly suspicious of the following:

1. The GNU vs BSD Mix-up

The most common hallucination. The LLM assumes you have GNU tools on macOS/BSD, or vice versa.

1
2
3
4
5
6
7
# โŒ Hallucination (Fails on macOS/BSD)
sed -i 's/foo/bar/g' file.txt   # GNU syntax. BSD requires -i ''
date -d "yesterday"             # GNU syntax. BSD requires -v-1d
find . -mtime -1 -delete        # -delete is not POSIX

# โœ… Verification
# Always specify your OS in the prompt, or ask the AI for a POSIX-compliant equivalent.

2. Fake awk and jq Functions

LLMs often invent higher-level programming functions inside text processing tools.

1
2
3
4
5
6
# โŒ Hallucination
jq '.users | filter(.age > 30) | sort_by_key(.name)' data.json
# jq does not have a 'filter' or 'sort_by_key' function natively like this.

# โœ… Proper syntax
jq '.users[] | select(.age > 30)' data.json | jq -s 'sort_by(.name)'

3. Imaginary CLI Flags

LLMs will confidently append flags that sound logical but do not exist.

1
2
3
4
5
6
# โŒ Hallucination
tar --extract --file archive.tar.gz --strip-directories 1 --ignore-errors
# '--ignore-errors' is hallucinated or misused in this context.

docker run --memory-limit 1G ubuntu
# It is '--memory', not '--memory-limit'.

๐Ÿ” Automated Verification Strategies

You can build scripts to verify if the commands and flags generated by the AI actually exist on your system.

The "Dry Run" Syntax Check

Before running any AI script, use the shell's built-in syntax checker.

1
2
# Checks syntax without executing. Will catch mismatched quotes, bad loops, etc.
bash -n ai_script.sh

Automated Command Existence Checker

Use this snippet to verify that every command used in an AI script actually exists in your PATH.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#!/bin/bash
# verify-commands.sh - Checks if commands in a script exist

SCRIPT="$1"

# Extract the first word of every line ignoring comments, assignments, and empty lines
# Note: This is a heuristic and not a perfect parser.
grep -vE '^(#|$|[A-Za-z0-9_]+=|return|if|then|else|fi|while|do|done)' "$SCRIPT" | \
    awk '{print $1}' | \
    sort -u | \
    while read -r cmd; do

        # Skip shell builtins and keywords
        if type -t "$cmd" | grep -q "keyword\|builtin"; then
            continue
        fi

        # Check if external command exists
        if ! command -v "$cmd" >/dev/null 2>&1; then
            echo "โš ๏ธ WARNING: Hallucination detected! Command not found: $cmd"
        else
            echo "โœ… Validated: $cmd"
        fi
    done

๐Ÿง  Prompting to Prevent Hallucinations

The best way to handle hallucinations is to prevent them through strict prompt engineering.

Add the "Verification Constraint" to your prompts:

"Do not invent any flags. Before using a flag for tar, sed, awk, find, or date, verify mentally that it is strictly POSIX compliant. If you must use a GNU or BSD specific flag, add a comment explaining why."

Add the "Man Page" approach:

"Act as a strict parser. I need a command to parse JSON. Only use jq features documented in the official jq 1.6 manual."


๐Ÿ› ๏ธ The "Help/Version" Context Injection

When asking an AI to write a complex command wrapper, inject the tool's --help output into the prompt so the LLM has grounded context.

1
2
3
4
# Ask AI to write an ffmpeg command, but give it the exact help output first
(echo "Write an ffmpeg command to convert an mp4 to a webm optimized for web."; \
 echo "Here is my ffmpeg version and help output to ground you:"; \
 ffmpeg -h 2>&1) | llm
This practically eliminates flag hallucinations because the LLM grounds its generation in the provided text.


๐Ÿงพ Summary Checklist

โœ… Beware of OS drift: Always verify sed, awk, find, and date commands. โœ… Verify JSON/Text tools: Double-check jq and awk syntax for invented functions. โœ… Dry-Run everything: Always use bash -n and shellcheck. โœ… Inject context: Feed --help or man outputs directly to the LLM to ground its knowledge.


๐Ÿงพ See Also