What's in a codebase?
2026-03-23
Tagged: llms, strategy, software engineering
Does it ever make sense to rewrite your codebase from scratch?
For decades, the answer had been an unambiguous no, ever since Joel Spolsky argued that rewrites were the “single worst strategic mistake that any software company can make”.
In the era of coding agents, the cost of writing code has dramatically shifted, making it possible to rewrite your codebase from scratch, every week, if you really wanted to. But “possible” and “makes sense” are not the same. In this essay I explore the value of a codebase.
The compiler analogy
We’ve been here before - several times, actually. C codebases are ten times shorter than the assembly that they compile to, and the generated assembly code is worth approximately nothing compared to the C codebase. Decades later, Python codebases are ten times shorter than the equivalent C code, and few are weeping for the C codebases they replaced. A spec might be yet another ten times shorter than the Python code, with coding agents serving as the “compiler”.
At each level of compression, detail is necessarily lost (historically, the low-level implementation tricks required to extract maximally performant software). If you couldn’t tolerate that lossy compression, there was always the option of inlining assembly into C, or embedding C into Python. Today, coding agents fail to generate maximally simple code, often generating redundant copies of code, or having torturous data flows instead of refactoring the underlying information architecture. Perhaps we’ll have to inline Python code into the spec.
The coding agent works well as a decompression algorithm because it contains humanity’s collective knowledge of different coding patterns, algorithms, and techniques. You can invoke that knowledge with a single word – if you happen to know the right word. Agentic programmers of the future may have to learn an encyclopedia of programming patterns and techniques and when they are applicable, to be effective at their jobs.
The compilation analogy extends even further - just like many build systems allow incremental recompilation of the parts of your program that changed, you can also imagine having a agent take a text diff on your updated spec, and incrementally update an existing codebase, rather than rewriting from scratch.
Coding agents and specs
I’ve been using the word “spec” loosely, but what is a spec, actually?
One answer is an extensive test suite: We’ve seen a few examples of this already (vinext, chardet); given an exhaustive set of unit tests / API specs, an agent can rewrite the codebase, possibly in a completely different language or context. In response to these demos, some companies are considering pulling their unit tests from their open-sourced code – although I should note that an existing codebase can be fuzzed to regenerate a unit test suite, so you may as well pull the whole thing! SQLite is a notable outlier here - their test suite is 99.8% of their codebase and they’ve kept it private since inception, despite keeping the source code public.
One notable failure of this approach is Anthropic’s C compiler exercise, in which the agent succeeded in writing a C compiler that compiled Linux against several architectures (wow!), but due to a lack of clean internal abstractions, it wasn’t likely to compile anything else, and had major performance shortcomings.
Perhaps what that attempt needed to complement the unit tests was a design doc, with key architectural decisions laid out. This would provide the core of the software, while the unit tests covered the periphery.
Still, we’re missing detail. What about comments, like
## This call is expensive - only invoke when X is true, or
the wisdom embedded within historical commit messages? What about the
bugfixes, feature requests, and performance fixes recorded in issue
trackers or version release notes? Q/A knowledgebases, FAQs, and
user-facing manuals contain info about user-facing edge cases and their
current or desired resolution. Simply scraping this content would be
futile - only 1% would actually be valuable, and the rest would either
be obsolete, redundant with the spec, or mutually contradictory.
You could drop this level of detail from the spec and gain incredible feature velocity, but that would result in buggy, nonperformant software that only has 2 9s of reliability. Maybe every developer in the world would use it anyway, who knows? shrug
Codebases coevolve with people
To expand the definition of “spec” even further, there are many ways in which even having the codebase as spec is still an underspecification.
Codebases exist alongside people: the engineers, of course, but also the on-call, the end user, the support team, and so on.
Software that’s often used on the go will develop tolerance to flaky internet connections. Software that’s used intimately by a small number of deep users will develop muscle memory for common workflows – with or without the developer’s help. The on-call will develop a playbook that addresses the specific ways in which the software tends to fail. The engineers will discover the easiest ways to test specific types of changes by wiring up a mix of production and development versions.
Internally, the codebase will have annealed by developing abstraction layers, encapsulating complexity, and clarifying key concepts. For infrastructural services, where requirements change more slowly and more investment goes into optimizing the algorithmic core, we often find a deeply elegant and minimal codebase. Simple codebases are more amenable to future changes, from both humans and agents.
The codebase is the central clearinghouse for everyone’s efforts, and it is, in the end, the most faithful spec of the product, simply because it is the reality that users and engineers are interacting with every day. It’s fundamentally why Joel argued against rewriting codebases.
Closing the loop on codebase rewrites
Let’s say for the sake of argument that we had a spec, written as some keyword-laden design doc, that captured all of the above complexity (or some acceptably lossy compression thereof).
How would you set up a coding agent to recompile this codebase from scratch?
I would imagine that the process would start with core data structures and API boundaries. A set of parallel agents can then implement within the API boundaries. Throughout the process, agents would attempt to use the codebase as a user/engineer/SRE/support human would use it; UX issues and bugs would be fixed. Refactoring agents would continually attempt to simplify the evolving codebase.
Still, I’m skeptical that this process would work out of the box.
While working on Cartesian Tutor, I tried to set up fake LLM students to
take my LLM-delivered lessons, so that I could iterate on my product
without needing real students. This failed because I could never get the
LLM to have the same issues as real students; it was always just a LLM
pretending to be a student who didn’t know something, while actually
fully knowing about it. Real students always had more surprising failure
modes resulting from nonobvious knowledge or conceptual gaps (e.g. some
students didn’t know that / could mean division, having
only ever seen the ÷ symbol in their printed homework).
Similarly, the real-world system failures that happen at scale are always weirder than you could imagine. Diagnosing, finding hotfixes, and then figuring out how to test the real fix is hard enough as a human; reproducing these system failures in an agent-testable way is inevitably going to be expensive and impractical.
Conclusion
In many ways, the value of a working codebase is not unlike the state of human civilization more generally. Collectively, we can do incredible things, but so much of it is illegible tribal knowledge (e.g. Fogbank), passed down orally through generations of apprentices, and through the capabilities of machines that fabricate the next generation of machines (e.g. semiconductor stepper motors). We owe a lot to the work of people who refine human knowledge, extract its essence and compile it into a single textbook to help future generations reach new heights. So far, this has proved to be a uniquely human capability.