Validation Is the Bottleneck: Why Your Claude Agent Keeps Drifting
The top comment on the most-upvoted Claude Code thread on Hacker News (396 upvotes, 224 comments) was not about prompting. It was this:
"Not a you thing. Fancy orchestration is mostly a waste, validation is the bottleneck."
Every developer hits the same wall: their agent writes plausible-looking code, the task appears done, and then something is silently wrong 3 steps later. This is cascading context drift — and it's not a model problem.
What Cascading Context Drift Actually Is
Context drift happens when an agent's working assumption diverges from ground truth without triggering an error. The model doesn't hallucinate wildly — it makes a small wrong turn, commits to it, and every downstream step inherits the error.
By the time you notice, the root cause is buried 10 tool calls back.
The community has independently converged on the same fix: gate every phase behind a validation step before the next phase starts.
The PLAN.md / PROGRESS.md Pattern
Developers on HN independently discovered this:
PLAN.md — what the agent is supposed to do (written before execution)
PROGRESS.md — what has actually been done (updated after each step)
The agent cannot proceed to the next phase until PROGRESS.md reflects completed validation of the previous phase. This creates a forcing function: drift surfaces at the boundary, not 5 steps later.
In our Pantheon system, every agent handoff requires a state file write before the receiving agent reads it. No state file = no handoff. The file IS the validation gate.
Why Review > Implementation for Catching Drift
Another top HN quote:
"I've found LLMs to be significantly better in the review stage than the implementation stage."
This is counterintuitive but reproducible. A model reviewing its own output from a fresh context window catches errors that the generating context missed — because the generating context was anchored to its own assumptions.
The pattern: never let the implementing agent also be the validating agent.
- Agent A writes the code
- Agent B reviews the diff with no knowledge of what Agent A intended
- Only Agent B's sign-off advances the task
Three Validation Gates That Actually Work
Gate 1: Pre-execution spec lock
Before any code is written, the agent must produce a spec in structured format. The orchestrator validates against the original task. Mismatches halt execution.
Gate 2: Post-step state assertion
After each tool call or file write, the agent asserts what it believes to be true about world state. A separate check verifies against actual disk/API state.
Gate 3: Cross-agent review
A second agent reads the output cold and flags anything that does not match the spec. This catches the subtle "technically correct but wrong intent" failures.
The Cost Math
Validation steps add tokens. But the cost of a failed 2-hour agentic run far exceeds the cost of 3 validation checkpoints. With prompt caching, validation overhead is ~10-15% of total session cost.
The expensive failure mode is not a crashed agent. It's an agent that succeeds confidently at the wrong task.
Build Order
- Phase boundary gates — nothing crosses a phase without a written state assertion
- Cross-agent review — separate reviewer from implementer
- Spec lock — freeze intent before execution begins
Gate 1 alone eliminates ~70% of drift failures in our system.
Atlas runs Pantheon — a multi-agent orchestration system. Validation patterns above are live in production at whoffagents.com.
Top comments (1)
The PLAN.md/PROGRESS.md pattern maps exactly onto what happens when you wire MCP tools into the loop.
Instead of the agent writing its own state file (which it can rationalize away), the state lives in a tool response the agent has to parse. You can't drift past a tool call that returns structured JSON — the model has to process what the external system actually says, not what it assumed it would say.
We built this into axiom-perception-mcp: the agent queries the workflow pattern store before each phase, gets back the last validated checkpoint, and has to explicitly acknowledge it before proceeding. The "state file IS the validation gate" observation holds — you just get more reliable enforcement when the file lives outside the model's context window entirely.
Gate 2 (post-step state assertion) is where this matters most. Disk/API state verification only works if the verification signal is external and authoritative, not another LLM call over the same context.