Garbage in the Loop

2026-06-24

Tagged: llms

Since Opus 4.5 came out last November, my daily driving experience with coding agents feels like it’s stagnated. It’s not that AI has plateaued; it continues to accomplish feats that were not even in my realm of possibilities a year ago: codebases rewrites pinned to extensive unit tests, mathematical theorem proving, security exploit discovery, autoresearch, etc.. I am starting to believe that I am the bottleneck, not because I have to sit there and prompt the agent, but because the agent needs to untangle my sloppy prompting.

In this essay, I consider where we should focus our efforts now that frontier models are no longer the dominant source of garbage.

A garbage loop example

“Garbage in the loop” is a fusion of two phrases: “garbage in, garbage out”, and the agentic loop.

A typical garbage loop (early-2026) might go as follows:

I ask the agent: “I’m seeing bug X on system Y. It’s coming from the Z module, go investigate and fix.”
Agent greps the monorepo for Z and comes up with 4 search results, only one of which is in system Y. But System Y’s name isn’t part of the filepath, so the agent doesn’t know which one is correct. Plausibly, all of the results might be different parts of system Y.
It wastes some time digging further before finding the right module Z.
Along the way, the agent has learned some plausible concepts about system Y’s doppelgangers that don’t actually exist in system Y.
It misdiagnoses the error, hallucinating some chain of causality that would make sense if those other concepts actually existed – but they don’t.
It invents a fix, tests it, and declares the task complete.
When I tell it that I’m still seeing the error, it starts inventing new epicycles upon epicycles to explain away the discrepancy.
I give up, clearing the session and reprompting with the right filepath.

The original garbage was my insufficiently precise reference to system Y, and the agentic loop failed to recover.

Here are all the possible intervention points

Don’t even mention Y or Z; instead just reference the URL of the page you saw the error on. Alternately, name the exact filepath to system Y, or change directories into system Y so that the model’s search returns the right results.
The harness could have indicated something about me and the systems I usually work on, to help the agent disambiguate.
The environment could be cleaned up so that systems are better documented - either self-documented through better naming, or with AGENTS.md indicating common confusion.
Frontier labs train smarter models that, at higher reasoning levels, spend more time digging into and figuring out the relationship (or non-relationship) of the search results, and to be more thorough in testing its hypothesis.

Division of Responsibilities

User, harness, environment, and model. Whose responsibility is it to fix these garbage loops?

One extreme take is that as models get smarter, they’ll just figure it out themselves, given sufficient tokens and a memory device of some sort. Any attempt to short-circuit this process would be a violation of The Bitter Lesson, which warns us that building knowledge into our agents is effective in the short run and actively harmful in the long run.

The antipodal stance (call it the Babbage take) is that models should restrain themselves to doing exactly what they’re asked, and that it’s the user’s responsibility to input their prompt exactly as they want it done.

The theory of risk compensation says that we might take agents right up to the very precipice of usefulness. Thus, we end up with a very rough equation:

User prompt quality + Harness quality + environment quality + model quality = Agent quality = Task scope

That’s the equation if we assume all four parties are working towards improving their game. Otherwise, we might have something closer to the following:

Harness quality + model quality = Agent quality = Task scope + tolerance to bad user prompt + tolerance to bad environment

You won’t be able to tackle more ambitious tasks if the user slurps up all improvements in agent quality by being worse at prompting, or if the codebase/information environment fills up with slop.

Garbage ontology

Stepping back, here’s a more complete list of garbage sources:

User
- I told the agent to use a wrong approach (forgot about a constraint, assumed a fix would be simple but it wasn’t, picked a bad solution).
- I prompted in the wrong environment (the wrong agent conversation, the wrong SSH terminal, the wrong git branch/worktree/checkout.)
- I gave an underspecified prompt which could quite reasonably be interpreted differently.
- I maintained one very long multi-topic session, which contained instructions that were only correct with respect to previous topics.
- Unprocessed prompts are passed in (e.g. raw user feedback instead of curated JIRA tickets).
Harness
- System prompt was fixing deficiencies in past model, but are irrelevant to this generation.
- System reminder was injected at the wrong time.
- Too many tools/MCP servers/skills.
- Bad handoff during compaction or subagent prompting.
Environment
- System prompt or skill gave generic advice that was inappropriate for this task.
- Stale AGENTS.md, code comment, design doc, outdated Slack conversation, etc..
- LLM slop generated from previous sessions (code, skills, docs, etc.).
- Tool call returned too much, too little, or incorrect/misleading output.
Model
- Bad reasoning traces.
- Hallucinations.
- Regression to the internet mean.

Garbage collection

User-generated garbage

Users should do their best to use the right language given their level of confidence in the prompt. This helps the agent disambiguate which parts of the prompt might be garbage and safer to ignore. In my garbage loop example, if I had weakened my confidence on where the bug was coming from, the agent would likely have taken more iterations to confirm the source of the bug, instead of assuming that my diagnosis of System Y / Module Z was correct. More generally, casual language might steer the LLM towards a lazier, good enough approach, while formal specification language might steer the LLM to nail each detail. (Researchers call this tendency “eval awareness”.)

You should also work at the right task size. There’s a well-known finding in software that bugs are roughly proportional to line count, independent of programming language. When you manually write out lines of C, the opportunity for you to make some silly error around memory management is much higher, compared to if you write the same code in Python and let the language handle the memory management for you. If the human is statistically a garbage source, then the human should amortize their garbage production by taking larger steps and allowing the LLM to handle the details.

Harness-generated garbage

In harnesses that live through multiple generations of model, you inevitably end up with stale system prompts and guidance meant to correct bad behavior from previous generations of model. “NEVER DO THIS!”. “ALWAYS read X.md before doing Y”. “DO NOT ASK MORE THAN ONE QUESTION AT A TIME”. These types of prompting were absolutely necessary for previous generations of LLMs that each had their annoying quirk. Today’s LLMs also have annoying quirks - but different, more subtle ones. With the latest models:

Leave some wiggle room. Instead of “Never do this!”, say “Users dislike this!”. Instead of “Follow the instructions in X.md”, say “X.md contains guidance for doing Y effectively.”.
Randomly delete parts of your system prompt every so often. The burden of proof should be in the direction of “why should we add this back”, rather than on “why should we delete this”?

Rely less on human-built heuristics for when to inject content or system reminders, and rely more on the agent to know when to read a skill or find content. When you do inject system reminders, again, leave some wiggle room. Anthropic’s system reminders to Claude demonstrate this principle well.

Environmental garbage

Agents can do some garbage collection at runtime, but it usually costs them some additional turns, context searching, and tokens to do so. The blindingly obvious thing to do here is to fold these runtime discoveries back into the source material, so that the agent doesn’t have to repeat its work the next time.

The jury is still out on the most effective way to consolidate these discoveries. Claude Code has “dreaming”; OpenAI’s Codex has a less baked memories feature. Karpathy has LLM wiki, a Wikipedia-like system that agents can crawl through and manage autonomously. The older SKILLS.md and AGENTS.md formats are of course also good vessels for context.

It’s not obvious to me that agents are actually good long-term curators of codebases or more generally, the information environment. Here, there is lots of room for humans to participate: our culture has a long tradition of compressing knowledge and accelerating our own learning curve.

Oracles

Oracles, in the computer science sense, are systems that that can tell the agent whether it’s making progress or not. Perhaps it’s a set of unit tests, compilation success, performance metrics, and more broadly, any flavor of formal verification systems. Every accomplishment I noted in the intro (codebase rewrites, theorem proving, security exploit discovery, autoresearch) uses an oracle.

Oracle design is difficult, and correctness is very much an exercise for the user. Bad oracles suffer many of the same pitfalls of incorrect reinforcement learning objective functions, although agents are often smart enough to not just hack your bad objective function as long as you give them the wiggle room to exercise their judgment.

Oracles are in many ways, convenient prompting shortcuts. Instead of having to formally specify what you want in human language, you get to specify what you want by reference to another system or artifact. For example math theorems are compact and precise prompts when interpreted in the context of a long tradition of mathematical language. Other oracles are references to black boxes – for example, “fuzz this binary to generate a compliance test suite, and then replicate its behavior in a new language”. The environment is another flavor of black box: “make this code go faster/use less memory”.

Conclusion

A year ago, this essay would have looked dramatically different. It would have been about how harnesses can be designed to compensate for LLM-generated garbage by providing human-curated context. Environmental garbage wouldn’t have even been on anyone’s radar, and people had much stricter prompting styles. Today, the models have gotten good enough that user/harness/environment are the bottlenecks on agent quality. In the future, agents will be bottlenecked by new issues (adversarial environments are my best guess).