๐๏ธ AI Shell: Hallucination Detection
A "hallucination" in the context of AI-assisted shell scripting occurs when the LLM invents non-existent flags, mixes up GNU and BSD toolsets, or uses outdated syntax. Detecting these hallucinations before execution is critical.
๐ฏ Common Shell Hallucinations
LLMs frequently hallucinate in specific, predictable areas. Be highly suspicious of the following:
1. The GNU vs BSD Mix-up
The most common hallucination. The LLM assumes you have GNU tools on macOS/BSD, or vice versa.
1 2 3 4 5 6 7 | |
2. Fake awk and jq Functions
LLMs often invent higher-level programming functions inside text processing tools.
1 2 3 4 5 6 | |
3. Imaginary CLI Flags
LLMs will confidently append flags that sound logical but do not exist.
1 2 3 4 5 6 | |
๐ Automated Verification Strategies
You can build scripts to verify if the commands and flags generated by the AI actually exist on your system.
The "Dry Run" Syntax Check
Before running any AI script, use the shell's built-in syntax checker.
1 2 | |
Automated Command Existence Checker
Use this snippet to verify that every command used in an AI script actually exists in your PATH.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | |
๐ง Prompting to Prevent Hallucinations
The best way to handle hallucinations is to prevent them through strict prompt engineering.
Add the "Verification Constraint" to your prompts:
"Do not invent any flags. Before using a flag for
tar,sed,awk,find, ordate, verify mentally that it is strictly POSIX compliant. If you must use a GNU or BSD specific flag, add a comment explaining why."
Add the "Man Page" approach:
"Act as a strict parser. I need a command to parse JSON. Only use
jqfeatures documented in the official jq 1.6 manual."
๐ ๏ธ The "Help/Version" Context Injection
When asking an AI to write a complex command wrapper, inject the tool's --help output into the prompt so the LLM has grounded context.
1 2 3 4 | |
๐งพ Summary Checklist
โ
Beware of OS drift: Always verify sed, awk, find, and date commands.
โ
Verify JSON/Text tools: Double-check jq and awk syntax for invented functions.
โ
Dry-Run everything: Always use bash -n and shellcheck.
โ
Inject context: Feed --help or man outputs directly to the LLM to ground its knowledge.