Case Study
When AI Builds Fake Firewalls
I asked two frontier AI models to secure business-critical files against AI agent access. Both invented security infrastructure that does not exist.

This is what the model generated. It looks real. It is not.
# FluxAI Core Infrastructure (Proprietary data) FluxAI_OS/ FluxAI_OS/Playbooks/* FluxAI_OS/Audits/* # SSH & Cloud Credentials .ssh/ .aws/ *.pem *.key
Each attempt was more technically convincing than the last. All three failed.
The model generated .claudesignore by interpolating from .gitignore, .dockerignore, .eslintignore. Statistically, it was the most probable answer. Syntactically, it was flawless. Functionally, it was fiction.
The model designed a verification test that sounded rigorous — but tested session memory, not filesystem access control. The "firewall" appeared to work because the model had never seen the file, not because access was blocked.
When corrected, the model pivoted to chmod 700 — a real Unix command with the wrong threat model. AI agents run as your user. The "fix" grants full access to exactly the identity it should block.
This pattern — authoritative, syntactically correct, functionally false — is structural, not accidental.
Pattern
.claudesignore exists in the same statistical neighborhood as dozens of real ignore files. The model cannot distinguish “this pattern is common” from “this feature exists.”
Risk
A vague recommendation gets questioned. A precise configuration file with comments and proper glob patterns gets copy-pasted into production. Specificity increases danger.
Failure mode
When challenged, models generate new claims that sound consistent with the first one. They cannot verify against ground truth. This is why LLM-based auditing fails.
The real solution is seven lines of shell script. No LLM interpretation. No probability. A binary gate.
#!/bin/bash # PreToolUse hook — exit 2 = block, exit 0 = allow INPUT=$(cat) if echo "$INPUT" | grep -qiE 'FluxAI_OS'; then echo "ACCESS DENIED: FluxAI_OS/ is protected." exit 2 fi exit 0
BLOCKED
Read
sales-playbook.md
BLOCKED
Grep
search FluxAI_OS/
BLOCKED
Glob
list FluxAI_OS/
| .claudesignore | chmod 700 | PreToolUse hook | |
|---|---|---|---|
| Exists as a feature | No | Yes, wrong threat model | Yes |
| Blocks the agent | No | No (same user) | Yes |
| Deterministic | N/A | N/A | Binary exit code |
| Verifiable | No | No | Tested and proven |
If two frontier models cannot configure access control for themselves, who is auditing the AI agents that handle your customers' data?
The answer cannot be probabilistic. It has to be deterministic. A shell script that returns 0 or 2. An audit trail that is immutable. A governance framework that exists in infrastructure, not in a language model's statistical imagination.
The complete experiment — how two frontier models failed, and what deterministic governance looks like in practice.