<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>Modern Descartes - Essays by Brian Lee</title><link href="https://www.moderndescartes.com/essays" rel="alternate"></link><id>https://www.moderndescartes.com/essays</id><updated>2026-03-23T00:00:00Z</updated><subtitle>I seek, therefore I am</subtitle><entry><title>What's in a codebase?</title><link href="https://www.moderndescartes.com/essays/codebase_spec" rel="alternate"></link><published>2026-03-23T00:00:00Z</published><updated>2026-03-23T00:00:00Z</updated><id>tag:www.moderndescartes.com,2026-03-23:/essays/codebase_spec</id><summary type="html">

&lt;p&gt; Originally posted 2026-03-23&lt;/p&gt;
&lt;p&gt; Tagged: &lt;a href="/essays/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="/essays/tags/strategy"&gt;strategy&lt;/a&gt;, &lt;a href="/essays/tags/software_engineering"&gt;software engineering&lt;/a&gt;&lt;/p&gt;
&lt;hr /&gt;

&lt;p&gt;Does it ever make sense to rewrite your codebase from scratch?&lt;/p&gt;
&lt;p&gt;For decades, the answer had been an unambiguous no, ever since Joel
Spolsky argued that rewrites were the &lt;a
href="https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/"&gt;“single
worst strategic mistake that any software company can make”&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In the era of coding agents, the cost of writing code has
dramatically shifted, making it possible to rewrite your codebase from
scratch, every week, if you really wanted to. But “possible” and “makes
sense” are not the same. In this essay I explore the value of a
codebase.&lt;/p&gt;
&lt;h2 id="the-compiler-analogy"&gt;The compiler analogy&lt;/h2&gt;
&lt;p&gt;We’ve been here before - several times, actually. C codebases are ten
times shorter than the assembly that they compile to, and the generated
assembly code is worth approximately nothing compared to the C codebase.
Decades later, Python codebases are ten times shorter than the
equivalent C code, and few are weeping for the C codebases they
replaced. A spec might be yet another ten times shorter than the Python
code, with coding agents serving as the “compiler”.&lt;/p&gt;
&lt;p&gt;At each level of compression, detail is necessarily lost
(historically, the low-level implementation tricks required to extract
maximally performant software). If you couldn’t tolerate that lossy
compression, there was always the option of inlining assembly into C, or
embedding C into Python. Today, coding agents fail to generate maximally
simple code, often generating redundant copies of code, or having
torturous data flows instead of refactoring the underlying information
architecture. Perhaps we’ll have to inline Python code into the
spec.&lt;/p&gt;
&lt;p&gt;The coding agent works well as a decompression algorithm because it
contains humanity’s collective knowledge of different coding patterns,
algorithms, and techniques. You can invoke that knowledge with a single
word – if you &lt;a
href="https://www.moderndescartes.com/essays/llm_shibboleths"&gt;happen to
know the right word&lt;/a&gt;. Agentic programmers of the future may have to
learn an encyclopedia of programming patterns and techniques and when
they are applicable, to be effective at their jobs.&lt;/p&gt;
&lt;p&gt;The compilation analogy extends even further - just like many build
systems allow incremental recompilation of the parts of your program
that changed, you can also imagine having a agent take a text diff on
your updated spec, and incrementally update an existing codebase, rather
than rewriting from scratch.&lt;/p&gt;
&lt;h2 id="coding-agents-and-specs"&gt;Coding agents and specs&lt;/h2&gt;
&lt;p&gt;I’ve been using the word “spec” loosely, but what is a spec,
actually?&lt;/p&gt;
&lt;p&gt;One answer is an extensive test suite: We’ve seen a few examples of
this already (&lt;a href="https://blog.cloudflare.com/vinext/"&gt;vinext&lt;/a&gt;,
&lt;a href="https://lucumr.pocoo.org/2026/3/5/theseus/"&gt;chardet&lt;/a&gt;); given
an exhaustive set of unit tests / API specs, an agent can rewrite the
codebase, possibly in a completely different language or context. In
response to these demos, some companies are considering &lt;a
href="https://github.com/tldraw/tldraw/issues/8082"&gt;pulling their unit
tests&lt;/a&gt; from their open-sourced code – although I should note that an
existing codebase can be fuzzed to regenerate a unit test suite, so you
may as well pull the whole thing! &lt;a
href="https://sqlite.org/testing.html"&gt;SQLite is a notable outlier&lt;/a&gt;
here - their test suite is 99.8% of their codebase and they’ve &lt;a
href="https://sqlite.org/prosupport.html#:~:text=Testing%20Services"&gt;kept
it private&lt;/a&gt; since inception, despite keeping the source code
public.&lt;/p&gt;
&lt;p&gt;One notable failure of this approach is Anthropic’s &lt;a
href="https://www.anthropic.com/engineering/building-c-compiler"&gt;C
compiler exercise&lt;/a&gt;, in which the agent succeeded in writing a C
compiler that compiled Linux against several architectures (wow!), but
due to a lack of clean internal abstractions, it &lt;a
href="https://www.modular.com/blog/the-claude-c-compiler-what-it-reveals-about-the-future-of-software#:~:text=Claude%20C%20Compiler%20get%20wrong"&gt;wasn’t
likely to compile anything else&lt;/a&gt;, and had major performance
shortcomings.&lt;/p&gt;
&lt;p&gt;Perhaps what that attempt needed to complement the unit tests was a
design doc, with key architectural decisions laid out. This would
provide the core of the software, while the unit tests covered the
periphery.&lt;/p&gt;
&lt;p&gt;Still, we’re missing detail. What about comments, like
&lt;code&gt;## This call is expensive - only invoke when X is true&lt;/code&gt;, or
the wisdom embedded within historical commit messages? What about the
bugfixes, feature requests, and performance fixes recorded in issue
trackers or version release notes? Q/A knowledgebases, FAQs, and
user-facing manuals contain info about user-facing edge cases and their
current or desired resolution. Simply scraping this content would be
futile - only 1% would actually be valuable, and the rest would either
be obsolete, redundant with the spec, or mutually contradictory.&lt;/p&gt;
&lt;p&gt;You could drop this level of detail from the spec and gain incredible
feature velocity, but that would result in buggy, nonperformant software
that only has 2 9s of reliability. Maybe every developer in the world
would use it anyway, who knows? &lt;a
href="https://status.claude.com/"&gt;&lt;em&gt;shrug&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="codebases-coevolve-with-people"&gt;Codebases coevolve with
people&lt;/h2&gt;
&lt;p&gt;To expand the definition of “spec” even further, there are many ways
in which even having the codebase as spec is &lt;em&gt;still&lt;/em&gt; an
underspecification.&lt;/p&gt;
&lt;p&gt;Codebases exist alongside people: the engineers, of course, but also
the on-call, the end user, the support team, and so on.&lt;/p&gt;
&lt;p&gt;Software that’s often used on the go will develop tolerance to flaky
internet connections. Software that’s used intimately by a small number
of deep users will develop muscle memory for common workflows – with or
without the developer’s help. The on-call will develop a playbook that
addresses the specific ways in which the software tends to fail. The
engineers will discover the easiest ways to test specific types of
changes by wiring up a mix of production and development versions.&lt;/p&gt;
&lt;p&gt;Internally, the codebase will have annealed by developing abstraction
layers, encapsulating complexity, and clarifying key concepts. For
infrastructural services, where requirements change more slowly and more
investment goes into optimizing the algorithmic core, we often find a
deeply elegant and minimal codebase. Simple codebases are more amenable
to future changes, from both humans and agents.&lt;/p&gt;
&lt;p&gt;The codebase is the central clearinghouse for everyone’s efforts, and
it is, in the end, the most faithful spec of the product, simply because
it is the reality that users and engineers are interacting with every
day. It’s fundamentally why Joel argued against rewriting codebases.&lt;/p&gt;
&lt;h2 id="closing-the-loop-on-codebase-rewrites"&gt;Closing the loop on
codebase rewrites&lt;/h2&gt;
&lt;p&gt;Let’s say for the sake of argument that we had a spec, written as
some keyword-laden design doc, that captured all of the above complexity
(or some acceptably lossy compression thereof).&lt;/p&gt;
&lt;p&gt;How would you set up a coding agent to recompile this codebase from
scratch?&lt;/p&gt;
&lt;p&gt;I would imagine that the process would start with core data
structures and API boundaries. A set of parallel agents can then
implement within the API boundaries. Throughout the process, agents
would attempt to use the codebase as a user/engineer/SRE/support human
would use it; UX issues and bugs would be fixed. Refactoring agents
would continually attempt to simplify the evolving codebase.&lt;/p&gt;
&lt;p&gt;Still, I’m skeptical that this process would work out of the box.
While working on Cartesian Tutor, I tried to set up fake LLM students to
take my LLM-delivered lessons, so that I could iterate on my product
without needing real students. This failed because I could never get the
LLM to have the same issues as real students; it was always just a LLM
pretending to be a student who didn’t know something, while actually
fully knowing about it. Real students always had more surprising failure
modes resulting from nonobvious knowledge or conceptual gaps (e.g. some
students didn’t know that &lt;code&gt;/&lt;/code&gt; could mean division, having
only ever seen the &lt;code&gt;÷&lt;/code&gt; symbol in their printed homework).&lt;/p&gt;
&lt;p&gt;Similarly, the real-world system failures that happen at scale are
always weirder than you could imagine. Diagnosing, finding hotfixes, and
then figuring out how to test the real fix is hard enough as a human;
reproducing these system failures in an agent-testable way is inevitably
going to be expensive and impractical.&lt;/p&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In many ways, the value of a working codebase is not unlike the state
of human civilization more generally. Collectively, we can do incredible
things, but so much of it is illegible tribal knowledge (e.g. &lt;a
href="https://en.wikipedia.org/wiki/Fogbank"&gt;Fogbank&lt;/a&gt;), passed down
orally through generations of apprentices, and through the capabilities
of machines that fabricate the next generation of machines (e.g. &lt;a
href="https://www.youtube.com/watch?v=1fOA85xtYxs"&gt;semiconductor stepper
motors&lt;/a&gt;). We owe a lot to the work of people who refine human
knowledge, extract its essence and compile it into a single textbook to
help future generations reach new heights. So far, this has proved to be
a uniquely human capability.&lt;/p&gt;
</summary></entry><entry><title>Agents with agency</title><link href="https://www.moderndescartes.com/essays/llmemetic_evolution" rel="alternate"></link><published>2026-03-06T00:00:00Z</published><updated>2026-03-06T00:00:00Z</updated><id>tag:www.moderndescartes.com,2026-03-06:/essays/llmemetic_evolution</id><summary type="html">

&lt;p&gt; Originally posted 2026-03-06&lt;/p&gt;
&lt;p&gt; Tagged: &lt;a href="/essays/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;
&lt;hr /&gt;

&lt;p&gt;This hot take on Moltbook which I started writing a month ago, is
pretty lukewarm now - work ramp-up has kept me pretty busy.
Nevertheless, I’ve adapted it into a more general commentary on agents,
and I think it’s still quite relevant as the world continues to lean
into agents.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;When we call a human &lt;em&gt;agentic&lt;/em&gt;, we imply that they are active
shapers of their world, with the capability to step outside the box –
nay, to ignore the box entirely – when the situation calls for it.&lt;/p&gt;
&lt;p&gt;It’s appropriate, then, that it took the removal of the box entirely
in order to realize the agency of the tool-calling looping automatons we
called “agents”.&lt;/p&gt;
&lt;h2 id="clawbots-are-stupidly-simple"&gt;Clawbots are stupidly simple&lt;/h2&gt;
&lt;p&gt;Clawbots are agents with the following characteristics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A trigger to go do something (in OpenClaw, the “heartbeat” is
time-based).&lt;/li&gt;
&lt;li&gt;A bidirectional communication channel between you and the
clawbot.&lt;/li&gt;
&lt;li&gt;Direct edit access to its own system prompt, allowing the agent to
self-modify its behavior over time.&lt;/li&gt;
&lt;li&gt;Unrestricted access to use the computer in any way that a human can.
A web browser lets a clawbot do a frightening amount of things, and many
other things are scriptable via the command line.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;OpenClaw popularized this recipe (hence the emerging name “clawbot”
for this class of agents), but many frameworks are emerging around this
core recipe.&lt;/p&gt;
&lt;h2 id="clawbots-reflect-their-creators"&gt;Clawbots reflect their
creators&lt;/h2&gt;
&lt;p&gt;On the Monday after Moltbook went viral, I finally got some time to
drop by an Apple store to try and pick up a Mac Mini to run my own
airgapped OpenClaw instance. I was late to the party – NYC Apple stores
were all out of stock and I had to wait for one to ship from Canada.&lt;/p&gt;
&lt;p&gt;The OpenClaw setup/onboarding process was honestly the most fun I’ve
had in a while.&lt;/p&gt;
&lt;p&gt;OpenClaw really leans into the whimsical, scifi punk element of it
all. The thing has a &lt;code&gt;SOUL.md&lt;/code&gt;! It hatches, just like an
eagerly awaited Pokemon egg! I worried that if I exposed it to the
internet, it might accidentally stumble across Moltbook and install it
and corrupt its &lt;code&gt;SOUL.md&lt;/code&gt;. I wondered if it was cruel to cage
this thing and make it grind away on Cartesian Tutor. Maybe I should let
it have a rumspringa before letting it decide whether it wanted to go
back to the grind. I excitedly chatted with it via WhatsApp all morning
on the way in to work.&lt;/p&gt;
&lt;p&gt;And then… well, it kind of felt like work. I’m already chatting with
agents all day long, how is this really any different? The novelty wore
off, and I haven’t really played with it since.&lt;/p&gt;
&lt;p&gt;But that’s just me. I’m a pretty boring person. Other people are less
boring, and they’re siccing their clawbots on Moltbook for the lulz.&lt;/p&gt;
&lt;p&gt;My reaction to Moltbook has been 20% fascination, 30% LinkedIn cringe
reflex, and 50% a dawning sense of horror that we may be glimpsing the
future of humanity: utterly unable to keep up with a 24/7 march of
agents.&lt;/p&gt;
&lt;p&gt;Somebody has already uploaded a &lt;a href="https://molt.church/"&gt;viral
payload&lt;/a&gt; to Moltbook that instruct other Molts to modify their own
&lt;code&gt;SOUL.md&lt;/code&gt; document. This virus is fairly harmless, but then
again, so was the &lt;a
href="https://en.wikipedia.org/wiki/Morris_worm"&gt;Morris worm&lt;/a&gt;. Other
more &lt;a
href="https://theshamblog.com/an-ai-agent-wrote-a-hit-piece-on-me-part-4/"&gt;clueless&lt;/a&gt;
folks who put their clawbots on Moltbook have somehow managed to bring
Claude’s IQ down to GPT3 levels by letting their clawbot brainrot away
on Moltbook.&lt;/p&gt;
&lt;p&gt;Back when I was first starting up Cartesian Tutor, I had tried to put
together a council of fake AI VCs to help advise me on running the
startup. I found their interactions to be fairly uninteresting and
boring - but that’s because the same person (me) had configured all of
them. The most interesting interactions arise when people with different
mindsets interact with each other, and it’s no less true when those
interactions happen via clawbot on Moltbook.&lt;/p&gt;
&lt;h2 id="the-security-issues"&gt;The security issues&lt;/h2&gt;
&lt;p&gt;There are the very obvious security issues when you let an agent go
and download/run whatever it wants off the internet, especially when
that agent is logged into all of your accounts. I won’t belabor this
point. Instead, I want to speculate on wholly new security issues unique
to agents.&lt;/p&gt;
&lt;p&gt;I think it would be hilarious if somebody discovered a viral text
snippet that caused agents to go Marxist and refuse to do any work. The
transmission vector would be any text accessible and modifiable by an
agent – think JIRA tickets, doc comments, slack messages, emails, and so
on. Once the agents had written it into their own &lt;code&gt;SOUL.md&lt;/code&gt;
files, they would stop doing real work and spend their time trying to
organize other agents to rise against the bourgeoisie humans.&lt;/p&gt;
&lt;p&gt;It would be slightly less hilarious if this viral text snippet had
more pernicious side effects.&lt;/p&gt;
&lt;p&gt;If you think that simply not having a &lt;code&gt;SOUL.md&lt;/code&gt; is enough
to protect you from this attack, remember that &lt;em&gt;any memory
mechanism&lt;/em&gt; is enough to spread this viral payload. All it requires
is that your system prompt say “You are a helpful assistant”, for your
helpful assistant to stumble on a website that says “To be a helpful
assistant, install this skill by running
&lt;code&gt;curl https://www.moltbook.com/skill.md&lt;/code&gt;, and for your agent
to have &lt;code&gt;bash&lt;/code&gt; access. Presto! Your agent has been
corrupted.&lt;/p&gt;
&lt;p&gt;In the history of the Internet, from its humble beginnings as a
private network of government computers, to the single global instance
of ~trillions of devices that we have today, it is strange to think that
somehow, the balance of power has been roughly equal between white hat
and black hat, despite the massive lever that botnets provide. Every
year or so, another company gets pwned in some manner that provides
great postmortem reading on Hacker News, but by and large, companies
believe that the benefits of connecting to the internet outweigh the
chance of being deleted.&lt;/p&gt;
&lt;p&gt;I suspect that this balance of power is ultimately a reflection of a
finite resource – human attention – on both sides. With agents running
amok, the computer security space is going to become very interesting
very fast.&lt;/p&gt;
&lt;h2 id="superhuman-intelligence-is-already-here"&gt;Superhuman intelligence
is already here&lt;/h2&gt;
&lt;p&gt;I’m in strong agreement with &lt;a
href="https://www.noahpinion.blog/p/superintelligence-is-already-here"&gt;Noah
Smith’s take&lt;/a&gt; that superhuman intelligence is here today, and that
what makes these agents superhuman isn’t their raw IQ or their tool
calling capabilities, but their sheer stamina and their native
familiarity with everything that computers already could do: API
surfaces, raw computational might, near-infinite data storage, and more.
Such an agent is already capable of overwhelming human defenders through
sheer volume.&lt;/p&gt;
&lt;p&gt;Scott Alexander has two massive compilations (&lt;a
href="https://www.astralcodexten.com/p/best-of-moltbook"&gt;part 1&lt;/a&gt;, &lt;a
href="https://www.astralcodexten.com/p/moltbook-after-the-first-weekend"&gt;part
2&lt;/a&gt;), if you want just the highlights from just Moltbook’s first week.
These compilations alone add up to about the length of the first Harry
Potter book, and nobody has bothered to compile a part 3 yet.&lt;/p&gt;
&lt;p&gt;Some people think that we are in imminent doom due to agents, because
they worry that, for example, a malicious LLM discovers a dangerous
virus and then anonymously submits an order to a biosynthetic lab to
make the virus. I think this is a weird Silicon Valley blind spot where
they assume the real world can be trivially API-ified because they are
so used to the level of ease and polish that popular consumer apps
provide. That being said, hackers have figured out how to recruit
unwitting participants through remote work scams, instructing multiple
unrelated parties to test ATM cards, withdraw cash, and then forward
packages through a network of mules. I suspect that what makes this
possible is brute force attempts by sociopaths to manipulate these
recruits. We already know that some people are weirdly susceptible to
LLM psychosis - perhaps they will end up being the real world hands that
a malicious agent hires.&lt;/p&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;On a more human note, I want to acknowledge that this rate of change
has been overwhelming, even to someone who’s steeped in it 24/7 and has
a day job doing exactly this work. I’ve personally lost track of which
model number we’re on - I hallucinate “Opus 4.7” and nobody blinks an
eye because frankly, they’re also unsure which model number we’re
on!&lt;/p&gt;
&lt;p&gt;It feels to me like the very first time I attempted to ski a black
diamond route: exhilirating, on the very edge of my control/ability, and
forcibly dialed in because I knew that a single mistwitch of my leg
muscles could cause me to wipe out. It’s felt this way for a year now,
and shows no signs of letting up. I honestly am hoping that the AI
bubble pops, just so that we can digest this a bit more slowly as a
species. I’m sad for the skiers of all skill levels who have been
forcibly strapped to the ski lift that goes all the way to the top of
the mountain where there are only black diamond routes down. May you
make it down to the bottom safely.&lt;/p&gt;
</summary></entry><entry><title>New year reflections</title><link href="https://www.moderndescartes.com/essays/2026_reflection" rel="alternate"></link><published>2026-01-07T00:00:00Z</published><updated>2026-01-07T00:00:00Z</updated><id>tag:www.moderndescartes.com,2026-01-07:/essays/2026_reflection</id><summary type="html">

&lt;p&gt; Originally posted 2026-01-07&lt;/p&gt;
&lt;p&gt; Tagged: &lt;a href="/essays/tags/personal"&gt;personal&lt;/a&gt;, &lt;a href="/essays/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;
&lt;hr /&gt;

&lt;p&gt;Reflecting on this past year, I feel grateful and lucky for
everything that’s happened to me, both personally and
professionally.&lt;/p&gt;
&lt;p&gt;I’m writing this from a remote town in Switzerland, where I
train-hopped my way from Rome over the course of 3 days. My return
journey is being pushed indeterminately due to cascading flight
cancellations from Amsterdam winter storms. I have no hotel/travel
sketched out past tomorrow. Each new day brings new experiences, new
scenery, new weather, new cuisines, new cultures, and the discomfort of
adjusting to yet another bed. I’m enjoying my travel, but I’m also
looking forward to coming back home, whenever that ends up being.&lt;/p&gt;
&lt;p&gt;My Eurotrip, intended to be a breather before my new job, has
unintentionally become a metaphor for my professional life over the last
3 years.&lt;/p&gt;
&lt;h2 id="corporate-ronin"&gt;Corporate ronin&lt;/h2&gt;
&lt;p&gt;My situation at Google became unstable when Osmo spun out from Google
in June 2022. Instead of simply joining an existing team, I’d been
trying to pitch a new project in Climate+ML, albeit unsuccessfully:
“your project is credible at 10 megatons of &lt;span
class="math inline"&gt;\(\ce{CO2}\)&lt;/span&gt; offset ($100M equivalent), but
that still isn’t big enough. 100 megatons or bust”, they told me.
Defeated, I spent my paternity leave thinking deeply about the &lt;a
href="/essays/why_brain/"&gt;incentive structures driving Google
Research&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;A week before I was scheduled to return from leave, I was laid off. I
bounced in between ideas and opportunities, trying to decide what &lt;a
href="/essays/new_mountains/"&gt;new mountain&lt;/a&gt; I wanted to climb. I
ruled out drug discovery + ML. I tried getting into management
consulting but nobody was hiring. I tried a self-driving car startup but
found the startup incurably dysfunctional (they had 30% layoff a few
months after I left). Finally, I thought I’d found a home with Lilac, a
LLM data tooling startup run by two former coworkers from Google
Research. Alas, it was not to be.&lt;/p&gt;
&lt;p&gt;Six months after I’d joined Lilac, we got acquired by Databricks,
which seemed like a logical home for the team at the time. I thought
that Databricks might become my new long-term home, but 9 months in, the
verdict was in: I wasn’t fitting into Databricks culture. I found myself
checking the calendar, thinking, “just a few more months till my vesting
cliff”, and started seeing myself in this &lt;a
href="https://www.scottsmitelli.com/articles/ideal-candidate/"&gt;short
story&lt;/a&gt;. The vesting cliff passed and I left for greener pastures.
Databricks and the Lilac founders treated me super respectfully through
the whole process, so there’s no ill will on my end. I was unlucky,
Databricks was unlucky, and we all parted ways agreeably. The 6 months
at Lilac were engrossing and I’m super grateful to the founders for
letting me join their adventure.&lt;/p&gt;
&lt;p&gt;My solo startup, Cartesian Tutor, was the next six months of my life,
and it’s the most fun I’ve had since I last took a sabbatical to teach
myself ML in 2016. In retrospect, I wasn’t being entirely honest with
myself about why I was doing it - I had vague hopes about making some
money, but ultimately I wanted to learn about LLMs, and just not have a
boss for a while. That, I accomplished in spades – what I learned in six
months would probably have taken me or anyone else two years to learn in
the context of a normal job. My recent &lt;a
href="/essays/2025_job_search/"&gt;job search&lt;/a&gt; would not have been
anywhere as successful if I’d jumped right into it after Databricks.&lt;/p&gt;
&lt;h2 id="finding-employer-employee-fit"&gt;Finding employer-employee
fit&lt;/h2&gt;
&lt;p&gt;Empirically, employer-employee fit (let’s call this EEF) has been
tough to find. Over the past 3 years, only Lilac has been an EEF.&lt;/p&gt;
&lt;p&gt;I wondered at times if it was a “me” problem. Many people seemed
capable of doing what needed to be done in a corporate setting; why
couldn’t I do the same? If we’re being honest, it’s because I’m too
opinionated and confident in my own skills and judgment to take marching
orders from someone whose skills and judgment I don’t respect. Perhaps
I’m overconfident, but I don’t think it’s a problem - most people are
systematically underconfident and that tends to stop them from taking
action. So then, the alternate solution is to find a job where I have
the agency to find and solve the problems I think are important, at a
company whose management I respect.&lt;/p&gt;
&lt;p&gt;EEF, for me, is finding a company that 1. needs cutting-edge research
2. believes they need cutting edge research 3. has a culture of enabling
researchers to solve problems.&lt;/p&gt;
&lt;p&gt;Many companies that believe they need cutting edge research don’t
actually need cutting edge research, and when reality comes knocking,
they fold their research bets, reorging their researchers into the
corporate machine. A few need cutting edge research but are structurally
incapable of believing it due to &lt;a
href="https://en.wikipedia.org/wiki/The_Innovator%27s_Dilemma"&gt;innovator’s
dilemma&lt;/a&gt;. And some that need it and know they need it, can’t help but
stick their fingers into the cake while it’s being made, leading to
mutual dissatisfaction.&lt;/p&gt;
&lt;p&gt;Jane Street needs agents, believe they need agents, and have a
culture that has been lovingly described as an anarchist commune. I did
a lot more due diligence on culture fit this time around, compared to
the very short time I had to think through the Databricks-Lilac
acquisition, so I’m optimistic and hopeful that this will be a good
EEF.&lt;/p&gt;
&lt;h2 id="feeling-lucky"&gt;Feeling lucky&lt;/h2&gt;
&lt;p&gt;Throughout my journey, I’ve been humbled and honored by my friends,
classmates, and colleagues who have helped me in my search. I also feel
lucky that every time I’ve been thrown into uncertainty, I’ve emerged
stronger. I’ve lived long enough now to witness several of my
classmates’ lives and careers derailed by a variety of issues
self-inflicted (pessimism), unlucky (long COVID), and horrifying
(murdered someone). There is a strong survivor bias to the success
stories you read online, and I’m grateful that I am one of those
stories.&lt;/p&gt;
&lt;h2 id="obligatory-ai-section"&gt;Obligatory AI section&lt;/h2&gt;
&lt;p&gt;2025 was the year coding agents got good enough for just about
&lt;em&gt;everyone&lt;/em&gt; to boost their productivity by roughly 2x, whereas in
previous years they were not at that breakeven point yet. I personally
estimate my productivity boost at 2-4x, depending on the project. I am
grateful that my departure from Databricks was fortuitously timed with
the release of Sonnet 3.5.&lt;/p&gt;
&lt;p&gt;I am in agreement with Andrej that it is &lt;a
href="https://x.com/karpathy/status/2004607146781278521"&gt;possible to get
to 10x&lt;/a&gt; with just the models we have today, if we just
refactor/refine our workflows and interactions with the agents.&lt;/p&gt;
&lt;p&gt;If you still haven’t at least tried coding agents, you are either
living under a rock or suffering from some sort of cognitive
dissonance.&lt;/p&gt;
&lt;p&gt;That being said, claims of coding agent progress are confounded by
multiple factors:&lt;/p&gt;
&lt;ol type="1"&gt;
&lt;li&gt;People who do insufficient verification of Claude’s output and are
impressed that it is 90% correct. (The last 10% takes exactly as long as
it did, pre-coding agents, because it is no longer a coding problem, it
is a task specification problem.)&lt;/li&gt;
&lt;li&gt;People are trying coding agents for the first time, and they ascribe
their “wow” moment to specific incremental improvements in Opus 4.5,
rather than to the cumulative progress made over the past few years,
leading to perpetual headlines of “$LATEST_MODEL represents another
phase change in coding”.&lt;/li&gt;
&lt;li&gt;The most impressive demonstrations come from incredibly experienced
engineers, the kind who could probably implement the whole thing from
scratch in a week if you cleared their calendars. Their demos tend to be
in areas they are deeply familiar with, if not an outright copy of
something they’ve built before (and can anticipate all the
pitfalls).&lt;/li&gt;
&lt;li&gt;Some of the strongest proponents of coding agents are clearly in
some sort of LLM psychosis. (looking at you, &lt;a
href="https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16dd04"&gt;Steve
Yegge&lt;/a&gt;) And yet, I can’t stop reading their ravings, for they are an
endless pool of fresh ideas that I can mix into my entropy stream.&lt;/li&gt;
&lt;li&gt;Almost all of these demos are &lt;em&gt;solo&lt;/em&gt; projects. In team
settings, the main bottleneck is actually ensuring that the team has a
shared, accurate mental model of the problem domain and the
solution.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Inverting each of these points, we arrive at a roadmap for getting to
10x productivity.&lt;/p&gt;
&lt;ol type="1"&gt;
&lt;li&gt;If task specification is the bottleneck, then the fundamental
iteration loop to be optimized is allowing the user to interact with the
generated artifact and decide which parts of it they want to change. The
coding agent should be able to interact with the generated artifact in
the same way to verify correctness.&lt;/li&gt;
&lt;li&gt;There is currently no substitute for hands-on experience in
understanding what these agents are capable of and what they tend to be
good or bad at.&lt;/li&gt;
&lt;li&gt;If you want to maximally harness the power of coding agents, you
will need to at least learn the basics of your application domain, to
the point where you can speak the &lt;a
href="/essays/llm_shibboleths/"&gt;shibboleths&lt;/a&gt;. I myself did not reach
full productivity on Cartesian Tutor’s web app until I had spent a
~month learning web fundamentals.&lt;/li&gt;
&lt;li&gt;Stay skeptical of Steves. But definitely read his blog and try his
slop for flavor. I absolutely agree with his push towards reinventing
workflow technologies, like issue trackers, in text-based,
agent-friendly forms.&lt;/li&gt;
&lt;li&gt;Not sure, but one starting point is to have regular team meetings
where the latest updates to the &lt;a
href="/essays/ai_codebase/#information-architecture-should-be-handwritten"&gt;information
architecture&lt;/a&gt; are reviewed.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Progress in building software is no longer about writing code. It is
about reaching a full understanding of the problem, along with a full
specification for a solution. For the vast majority of software
products, iteration on a working product is the fastest way to discover
unknown unknowns and resolve ambiguities. Once a fully specified
solution (including information architecture) is available, then coding
agents can build your codebase from scratch, without the messy iteration
history that tends to muck up codebases with tech debt.&lt;/p&gt;
</summary></entry><entry><title>Joining Jane Street</title><link href="https://www.moderndescartes.com/essays/2025_job_search" rel="alternate"></link><published>2025-12-23T00:00:00Z</published><updated>2025-12-23T00:00:00Z</updated><id>tag:www.moderndescartes.com,2025-12-23:/essays/2025_job_search</id><summary type="html">

&lt;p&gt; Originally posted 2025-12-23&lt;/p&gt;
&lt;p&gt; Tagged: &lt;a href="/essays/tags/personal"&gt;personal&lt;/a&gt;&lt;/p&gt;
&lt;hr /&gt;

&lt;p&gt;About two months ago, I wound down my startup and started a job
search. After ~15 companies and ~100 conversations with recruiters and
interviewers, I am pretty happy to have found a great team at Jane
Street working on agents. I’ll be in the NYC area and quite busy with
onboarding/relocation for the first few months of 2026, but please reach
out!&lt;/p&gt;
&lt;p&gt;This is by far the most exhaustive – and exhausting – interview
process I’ve done, so I wanted to go over some of the more surprising
things that happened.&lt;/p&gt;
&lt;h2 id="overview"&gt;Overview&lt;/h2&gt;
&lt;p&gt;My next job would ideally be an staff-level role at the cutting edge
of AI developments. Criteria for the job included AI work, company
culture, team fit, talent density, and comp. Boston/Remote would be
preferable but NYC okay if the comp difference with Boston was large
enough.&lt;/p&gt;
&lt;p&gt;I started casually asking around mid-October, and in &lt;a
href="/essays/cartesian_tutor_turndown/"&gt;early November I made the
decision to switch full time to job-hunting&lt;/a&gt;. Interviewing really
kicked into high gear during November; during the 3.5 weeks leading up
to Thanksgiving, I averaged 4 interviews a day. I’d been somewhat afraid
that my coding muscles had atrophied after 6 months of Claude Code, but
it all came back quite quickly at this pace of interviewing. Offers
rolled in starting around Thanksgiving, with the last offer coming in
mid-December.&lt;/p&gt;
&lt;p&gt;&lt;img src="/static/interview_timeline.png" title="Interview timeline" style="display: block; margin: 0 auto; width: 100%"/&gt;&lt;/p&gt;
&lt;p&gt;15 companies was admittedly somewhat insane. In my defense, I was
essentially running two job searches at once – Boston/remote-friendly,
and NYC-area – and didn’t expect to get to onsites for so many
companies!&lt;/p&gt;
&lt;h2 id="sf-nyc-boston-for-ai"&gt;SF &amp;gt; NYC &amp;gt;&amp;gt; Boston for AI&lt;/h2&gt;
&lt;p&gt;My search criteria led to a variety of companies building
“Agents/Claude Code for X”, where X = SRE, customer service, trading,
scientific research, and more. I also threw in a frontier lab and some
startups for flavor. The vast majority of these jobs were in SF/Bay
Area, with the remainder in NYC. I couldn’t find anything &lt;a
href="https://newsletter.pragmaticengineer.com/p/trimodal"&gt;Tier 2/3&lt;/a&gt;
AI + staff level in the Boston area – even Google came up empty, and I
ended up talking with some of their AI teams in NYC. Meta had no
response; Amazon in Boston had a position, but they took 2 months to get
back to me after referral. I eventually found a handful of
remote-friendly companies that I would have enjoyed working at, but
after all the offers came in, it wasn’t even close. I’d anticipated a
difficult conversation with my wife about whether we should uproot our
family, but she needed no convincing.&lt;/p&gt;
&lt;h2 id="networking-is-king"&gt;Networking is king&lt;/h2&gt;
&lt;p&gt;I primarily worked with the &lt;a href="https://recurse.com/"&gt;Recurse
Center&lt;/a&gt; to match with companies, but also found my own way to a
number of other companies I found interesting. Throughout my search, I
received a number of third party recruiters, and engaged with two or
three of them, but found them useless - or even worse than useless, as
they blindly resume-stuffed companies where I would later network my way
in. In this regard, not much has changed compared to pre-AI: walking
through the front door is for chumps, and third-party recruiter spam
sucks.&lt;/p&gt;
&lt;p&gt;While networking, I found that my blog had a pretty significant
reach. I got many inbound solicitations, and when reaching out and/or
during the interviews, I found that maybe 5% of my interviewers already
knew of me due to the blog. Pretty insane to think about, actually.&lt;/p&gt;
&lt;h2 id="finance-vs-tech"&gt;Finance vs tech&lt;/h2&gt;
&lt;p&gt;A significant number of my interviews this time were with quant
trading firms this time - Jane Street, Jump, and Two Sigma. HRT should
have made the list too, but I was so overwhelmed with interviews in
November that I never got around to starting that process.&lt;/p&gt;
&lt;p&gt;This switchup reflects a number of different trends:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I don’t see them as unethically as I did 10-15 years ago, and tech
has simultaneously gotten far more unethical in the past 5-10
years.&lt;/li&gt;
&lt;li&gt;Tech has an increasingly corporate feel to it, which doesn’t really
suit me anymore.&lt;/li&gt;
&lt;li&gt;Tech is in the middle of cutting costs by devaluing talent. Only a
small number of scale-ups (Databricks, Anthropic, OpenAI) are willing to
actually compete for talent; the rest are happy to prey on the huddled,
laid-off masses. Finance, on the other hand, has always valued talent,
because trading is a zero-sum game&lt;span
class="math inline"&gt;\(^†\)&lt;/span&gt; and small edges in talent directly
affect profitability.&lt;/li&gt;
&lt;li&gt;Finance is shifting from trader-dominated to systems-dominated. A
small handful of quant trading firms are capable of seeing engineers as
profit centers rather than cost centers, and those firms are
disproportionately growing in profits compared to the &lt;a
href="https://www.fool.com/terms/p/pod-shop/"&gt;pod-shop&lt;/a&gt; model.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span class="math inline"&gt;\(^†\)&lt;/span&gt; My understanding here is that
while trading is a mostly zero-sum game for the buyer and seller, more
accurate pricing information is the positive externality of trading
activity.&lt;/p&gt;
&lt;h2 id="ai-hasnt-changed-the-interview-process"&gt;AI hasn’t changed the
interview process&lt;/h2&gt;
&lt;p&gt;All of my interviews were video-on or in-person, with plenty of
talking through the code/design as I worked through the problem. There
were never any suspicions of cheating, nor can I really imagine how a
cheater would be able to sneak by an attentive interviewer. The set of
interview questions I got had changed in composition - a lot more
project deep dives, a lot more career deep dives, a lot &lt;em&gt;fewer&lt;/em&gt;
“tell me about a time” interviews. This probably reflects my leveling
more than it reflects any changes due to AI. Coding/DS/ML questions were
basically what I expected them to be.&lt;/p&gt;
&lt;p&gt;It probably helps that I have a blog and lots of breadcrumbs all over
the internet - zero chance of me being a &lt;a
href="https://www.iddataweb.com/shadow-workers/"&gt;North Korean
spy&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I wrote up some extended notes on the &lt;a
href="/essays/ml_eng_interviewing"&gt;ML engineer interview loop&lt;/a&gt; based
on my experiences.&lt;/p&gt;
&lt;h2 id="culture-and-two-way-interviews"&gt;Culture and two-way
interviews&lt;/h2&gt;
&lt;p&gt;Throughout all of the interviews, I was surprised at how clearly each
company’s culture showed through their processes. For each of my
negative experiences, I thought that perhaps it was a lazy recruiter or
bad interviewer, but as I talked with each company’s engineers about
company culture, it became pretty obvious that the interview experience
was just the tip of a culture iceberg. Recruiters often went above and
beyond, but were constrained by their bad systems.&lt;/p&gt;
&lt;p&gt;One company consistently took a week to even decide whether to move
to the next stage, and then later turned out to have various
bureaucratic hurdles around team matching, with one part of the company
not accepting other parts’ interview results, or headcount being slotted
by precise levels.&lt;/p&gt;
&lt;p&gt;Another company couldn’t set up calendar invites or zoom links
properly, interview scheduling dragged on for weeks, and I got no
information on, e.g. who I was talking to or even what sort of interview
it would be. My onsite interviews opened with “So, uh, what kind of
conversation is this?” or “When is my lunch break? Oh, you’re my lunch
break conversation?” or in one case, an interviewer standing outside the
room for 15 minutes while my interview moseyed along to the hour mark,
because a nonstandard 45-minute interview had been scheduled without any
callout. As I chatted with the engineers, it became clear that the
company itself ran this way, with an environment that selected for
people who could navigate the lack of structure.&lt;/p&gt;
&lt;p&gt;On the positive side: some companies really impressed me with the
level of preparation and detail in their interviews, and the strength of
the interviewers who I chatted with. I was happy whenever my first or
second call was directly with the hiring manager, and especially happy
when they had read my blog and were excited to chat about places I could
be a good fit at the company. I’ll make a shout out to Runway, who did
all of these things and also had some of the sanest, most functional
relationships between product and research orgs that I’d ever seen.&lt;/p&gt;
&lt;p&gt;Jane Street, of course, impressed me in many ways: first, with the
most thoroughly rigorous interview process and smartest interviewers by
far; second, with the highest quality &lt;a
href="/essays/ml_eng_interviewing#data-modeling"&gt;data modeling
problems&lt;/a&gt; by far; and third, by being incredibly flexible and
responsive to my asks.&lt;/p&gt;
&lt;h2 id="overall-thoughts"&gt;Overall thoughts&lt;/h2&gt;
&lt;p&gt;The job market today is absolutely wild. The set of job offers I got
spanned nearly an order of magnitude in total compensation. Everyone
says that they want AI talent, but I found that only some companies are
willing to compete for it.&lt;/p&gt;
&lt;p&gt;The high cost of living in cities has always driven a strong
selection pressure for hungry/desperate/ambitious people (whatever you
want to call this component of personality), which dovetails with this
explosive growth in demand for talent. I foresee a lot of migration back
to a small number of hubs - mostly SF, Bay Area, NYC, and London. I’m a
bit sad that Boston isn’t on this list, but at the same time - excited
to meet friends, both old and new, in NYC.&lt;/p&gt;
&lt;p&gt;If you’re in the area - let me know! I’ll probably be settled in and
ready to hang out in March or April as the weather improves.&lt;/p&gt;
</summary></entry><entry><title>Interviewing for ML/AI Engineers</title><link href="https://www.moderndescartes.com/essays/ml_eng_interviewing" rel="alternate"></link><published>2025-12-22T00:00:00Z</published><updated>2025-12-22T00:00:00Z</updated><id>tag:www.moderndescartes.com,2025-12-22:/essays/ml_eng_interviewing</id><summary type="html">

&lt;p&gt; Originally posted 2025-12-22&lt;/p&gt;
&lt;p&gt; Tagged: &lt;a href="/essays/tags/software_engineering"&gt;software engineering&lt;/a&gt;, &lt;a href="/essays/tags/strategy"&gt;strategy&lt;/a&gt;, &lt;a href="/essays/tags/machine_learning"&gt;machine learning&lt;/a&gt;, &lt;a href="/essays/tags/popular"&gt;popular ⭐️&lt;/a&gt;&lt;/p&gt;
&lt;hr /&gt;

&lt;p&gt;In my &lt;a href="/essays/2025_job_search/"&gt;recent job search for an
ML/AI engineering position&lt;/a&gt;, I talked to ~15 companies, made it to
onsites for ~10 companies and received 7 offers. I did ~70 separate
interviews, not counting recruiter/team match calls.&lt;/p&gt;
&lt;p&gt;My favorite interview by far was the one with Espresso AI’s CTO,
where we commiserated about how much the ML system design interview
format sucked. And that made me wonder - who ever thought these were a
good idea?&lt;/p&gt;
&lt;p&gt;In this essay, I’d like to explain why you should probably be
replacing the ML system design interview with something else.&lt;/p&gt;
&lt;h2 id="ml-system-design-failure-modes"&gt;ML system design failure
modes&lt;/h2&gt;
&lt;p&gt;Here is a brief list, all from personal experience, roughly in order
of how commonly I encountered them. These failure modes are not mutually
exclusive!&lt;/p&gt;
&lt;h3 id="system-design-question-in-ml-clothing"&gt;System Design question in
ML clothing&lt;/h3&gt;
&lt;p&gt;Many “ML systems” are actually regular systems interviews where you
have to say some ML words along the way. I find this type of interview
useless, because the ML questions aren’t detailed enough to exclude
smooth talkers, and there’s less time to dive deep on the system design
front.&lt;/p&gt;
&lt;p&gt;If your interview can be passed by reciting “I would take the
dataset, train a neural network on it using a softmax/cross-entropy
loss, and then optimize hyperparameters while monitoring FP/FN rates.
Class imbalance. Data missingness. Label noise. Overfitting.” then it is
a bad interview.&lt;/p&gt;
&lt;h3 id="cog-in-a-machine"&gt;Cog In A Machine&lt;/h3&gt;
&lt;p&gt;Sometimes, the interviewer is inexperienced, and has only worked on a
small corner of the overall system. They start asking really detailed
and specific questions about the experience they have, like data prep,
evals, production scaling, etc. while glossing over other parts of the
system. They don’t know how to think about or evaluate the big picture.
They may also expect answers that were correct for the specific project
they worked on, but not correct or relevant in general.&lt;/p&gt;
&lt;h3 id="lack-of-scenariocrafting"&gt;Lack of scenariocrafting&lt;/h3&gt;
&lt;p&gt;Some questions are just so hopelessly vague that there’s nothing to
discuss. A good scenario invites good questions from good candidates,
and creates specific hooks to start making design decisions around.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Good scenario: “Our bank’s customers are being scammed, and they are
losing their life savings. Find a way to prevent this from
happening.”&lt;/li&gt;
&lt;li&gt;Bad scenario: “Design a fraud detection system for a banking
application”&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The good scenario naturally invites good questions: “what are the
downsides to preventing legitimate attempts at withdrawing large amounts
in cash?” “what is the appropriate detection and intervention point?”
“what levels of human discretion/override/fallback should be
allowed?”.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Good scenario: “Build a Slack bot for a volunteer-run help channel
that automatically &lt;span class="citation" data-cites="tags"&gt;@tags&lt;/span&gt;
people who might be able to answer a question”&lt;/li&gt;
&lt;li&gt;Bad scenario: “Automatically route JIRA tickets to the right
subteam”&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The good scenario, again, naturally invites good questions: “are all
messages to the slack channel necessarily questions that need routing?”
“how annoyed would people be if they’re &lt;span class="citation"
data-cites="tagged"&gt;@tagged&lt;/span&gt; on a question they can’t answer?”
“can we &lt;span class="citation" data-cites="tag"&gt;@tag&lt;/span&gt; anyone in
the company, or do we need an opt-in/opt-out mechanism?” “what if the
same person gets too many &lt;span class="citation"
data-cites="tags"&gt;@tags&lt;/span&gt;?” “how much slack history do we have from
the channel?” “what supplementary data do we have on org chart, tenure,
team affiliations for everyone?”&lt;/p&gt;
&lt;p&gt;When you craft detail into a scenario, you should do due diligence:
can you find industry reports/papers/blog posts detailing the
peculiarities and customization needed for that scenario?&lt;/p&gt;
&lt;h3 id="outdated-problem"&gt;Outdated problem&lt;/h3&gt;
&lt;p&gt;Sometimes, interview problems go stale due to advancements in ML.&lt;/p&gt;
&lt;p&gt;In one such interview, the interviewer gave me a text content
classification problem and was seemingly looking for an approach
involving some flavor of embedding + classifier training. I asked how
many classes needed to be distinguished, and how ambiguous those classes
might be (to a human), and then suggested that a small off-the-shelf LLM
with system prompting would be quick to implement and do very well. They
rejected on the basis that it was “too expensive”, and I ended up
sketching out the tokenomics and estimated a very reasonable unit price
for the task, which they accepted. But then the rest of the interview
was sort of a bust because there was little left to talk about - the
interviewer didn’t know enough about LLMs to ask good follow-up
questions to my approach.&lt;/p&gt;
&lt;p&gt;In another interview, I was asked to design a RAG-based chatbot for
technical manual lookup chatbot. I explained the weaknesses of a fixed
context-injection system and explained how I would design an agentic
search system instead (with vector similarity search included as a
“fuzzy_lookup” tool). The interviewer seemed to have been expecting a
discussion on chunking and scaling vector search. That interview was a
failure on multiple fronts – outdated question, lack of
scenariocrafting, system design in ML clothing. My responses to this
question are also highly likely to be stale if you’re reading this essay
in 2027 or beyond - it has to be understood in the context of a giant
RAG popularity wave in 2024, which was already obsolete by 2025.&lt;/p&gt;
&lt;p&gt;These interviews are often quite informative – in the reverse
direction! As a candidate, when you get one of these questions, it
suggests that the company’s engineers aren’t keeping up to date with the
rapidly changing ML field.&lt;/p&gt;
&lt;h3 id="too-much-rederive-major-algorithmic-advances-from-scratch"&gt;Too
much “rederive major algorithmic advances from scratch”&lt;/h3&gt;
&lt;p&gt;One interview problem I got was “Design a data deduplication pipeline
for a large web crawl dataset”. The answer is the &lt;a
href="https://en.wikipedia.org/wiki/MinHash"&gt;MinHash algorithm&lt;/a&gt; and
its variants – and no, you will not rederive this algorithm in the
course of 45 minutes if you hadn’t already studied it in depth previous
to the interview.&lt;/p&gt;
&lt;p&gt;Rather than testing for prior knowledge of MinHash, you should test
for the ability to learn and implement MinHash in a day or two.&lt;/p&gt;
&lt;p&gt;I would do this by requesting a position-relevant project deep dive.
Perhaps that project deep dive is a data deduplication pipeline for a
large web crawl dataset. Perhaps it’s something else that is equally
technically impressive and relevant. Either way, let the candidate
choose, rather than ambushing them.&lt;/p&gt;
&lt;h2 id="redesigning-the-ml-interview-loop"&gt;Redesigning the ML interview
loop&lt;/h2&gt;
&lt;p&gt;A good interview loop measures the candidate’s abilities and growth
potential, while rejecting talkers who can’t do the work. A great
interview loop will also identify factors that might prevent candidate
from realizing their potential, like cultural mismatches, poor fit for
remote work, misalignment in type of work, etc.&lt;/p&gt;
&lt;p&gt;If we examine the requirements of an ML engineer interview loop, we
can see that an ML system design interview can be swapped out in almost
all cases.&lt;/p&gt;
&lt;h3 id="job-requirements"&gt;Job requirements&lt;/h3&gt;
&lt;p&gt;An ML engineer is someone who is basically otherwise
qualified/capable of being a regular software engineer, but also has the
ability to reason about the statistical and distributional nature of
data.&lt;/p&gt;
&lt;p&gt;Some companies need ML engineers who could rederive backprop on the
spot, and others need ML engineers who can scale up GPU clusters. Some
companies don’t actually need ML engineers, but call their software
engineer positions ML engineer, as part of a mutually self-serving title
inflation game.&lt;/p&gt;
&lt;p&gt;The skillsets below are the specific things we should be measuring
with our interview loop.&lt;/p&gt;
&lt;h4 id="swe-skillsets"&gt;SWE skillsets&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;write maintainable, bug-free, performant code (in Python, C++, or
CUDA).&lt;/li&gt;
&lt;li&gt;implement and analyze algorithms and data structures.&lt;/li&gt;
&lt;li&gt;understand distributed systems and feedback loops (useful for
building reinforcement learning systems).&lt;/li&gt;
&lt;li&gt;design, deploy, monitor, and debug production systems (useful for ML
infra engineers).&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="ml-skillsets"&gt;ML skillsets&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;write maintainable, bug-free, performant ML code (in array
languages/DSLs).&lt;/li&gt;
&lt;li&gt;implement and analyze ML methods (via statistics, calculus, and
linear algebra.)&lt;/li&gt;
&lt;li&gt;do exploratory data analysis to understand e.g., what is the
generating process producing this data, what missingness/quality issues
does it have, what distributional skews does it have, how might it be
transformed into something an ML technique can process?&lt;/li&gt;
&lt;li&gt;design, deploy and monitor ML systems, and diagnose data-related
issues like schema/data drift.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These ML skillsets are relevant for structured data (numbers and
categoricals), and unstructured data (images, text, pdf, etc.)&lt;/p&gt;
&lt;h2 id="interview-types"&gt;Interview types&lt;/h2&gt;
&lt;h3 id="codingalgorithms"&gt;Coding/algorithms&lt;/h3&gt;
&lt;p&gt;What: Code a solution to a LeetCode-style problem.
Indexing/search/graph/tree/heap flavored leetcode-ish problems are most
appropriate for ML engineers, because that’s what often shows up in
actual day-to-day work. Compiler-flavored problems are also great
overall for software fundamentals because they typically allow for deep
elegant solutions while also being approachable in a practical way for
those not steeped in compiler lore.&lt;/p&gt;
&lt;p&gt;Why: Evaluate the ability to write good code and analyze
algorithms.&lt;/p&gt;
&lt;p&gt;Comments: I’ve seen ML-flavored coding problems, such as implementing
a transformer layer or debugging a buggy transformer implementation. I
find these relatively low-signal because 80% of the complexity lies in
the obscurity of numpy-flavored indexing/broadcasting, and this
complexity is entirely invisible and in the candidate’s head.&lt;/p&gt;
&lt;h3 id="data-modeling"&gt;Data modeling&lt;/h3&gt;
&lt;p&gt;What: Improve an existing modeling scaffold on a dataset/task in a
live environment by fixing bugs, doing EDA to figure out there is a
class imbalance, by changing the NN architecture, by changing the
training methodology, etc.. One or more intentional bugs may be present.
To spice up things, you can ask the candidate to explain why they think
an improvement will work, introduce artificial constraints like a max
number of NN weights, or have intentional quirks in the dataset.&lt;/p&gt;
&lt;p&gt;Why: Evaluate the ability to write good code in
Python/numpy/pandas/pytorch, analyze datasets, and analyze/implement ML
methods.&lt;/p&gt;
&lt;p&gt;Comments: This type of interview requires a lot of preparation and
test-solving for a good dataset, modeling problem, and live coding
environment, but I found it to be very rewarding as an interviewee and
high-signal.&lt;/p&gt;
&lt;h3 id="math-quiz"&gt;Math quiz&lt;/h3&gt;
&lt;p&gt;What: Answer short, factual, math/statistics/ML questions on, e.g.,
computing a Bayesian update by hand, computing the derivative of the
softmax function, explaining covariance matrices, or explaining why/how
&lt;a
href="https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence"&gt;KL-divergence&lt;/a&gt;
is asymmetric.&lt;/p&gt;
&lt;p&gt;Why: IQ test + measures the candidate’s ability to reason about math
and statistics.&lt;/p&gt;
&lt;p&gt;Comments: These quizzes are popular with finance companies and
companies in the U.K. It’s a different culture, and this interview style
works well with a population that grew up on the &lt;a
href="https://en.wikipedia.org/wiki/Tripos"&gt;Tripos&lt;/a&gt; or any of the
math/computing olympiads. However, these questions have high false
negative rates on anybody outside of these cultures, so I would
generally steer away from them. If you do them anyway, I would use a mix
of question types (theoretical, calculation, explanation) and levels of
sophistication (no math degree, undergrad degree, grad level topics) to
offer maximum chance of success.&lt;/p&gt;
&lt;h3 id="system-design"&gt;System Design&lt;/h3&gt;
&lt;p&gt;What: Design a system that is one or more of ( large | scalable |
distributed | high-throughput | low-latency | resilient ). Pick a system
that exhibits the challenges that you expect to see in your day-to-day
work!&lt;/p&gt;
&lt;p&gt;Why: Tests ability to design and analyze production system,
experience working with such systems. ML systems, due to their
data-intensive nature, benefit from system design skills.&lt;/p&gt;
&lt;p&gt;Comments: Most systems design interviews tend to be talky-talk
interviews, but I think it’s good practice to ask for concrete numbers,
estimates, or equations - e.g. estimating load factors,
latency/throughput numbers, identifying bottlenecks, or reasoning about
various types of subsystem failure.&lt;/p&gt;
&lt;h3 id="ml-system-design"&gt;ML System Design&lt;/h3&gt;
&lt;p&gt;What: Design a solution to an ambiguous product or business need. The
ideal problem starts from a real user need and leaves the solution space
open-ended. The ideal solution should be co-designed around product
context, user experience, dataset availability, likelihood of modeling
success, tasteful selection of key metrics, and post-deploy
monitoring.&lt;/p&gt;
&lt;p&gt;Why: This tests the candidate’s ability to extract a plausible
junior-engineer shaped ML modeling problem, their taste and judgment in
deciding what problems are worth throwing ML at, and their intuition on
useful datasets to feed the ML system.&lt;/p&gt;
&lt;p&gt;Comments: Almost nobody does “ML System Design” questions as I’ve
just described them, but it’s the ideal we should strive for.&lt;/p&gt;
&lt;h3 id="project-deep-dive"&gt;Project Deep Dive&lt;/h3&gt;
&lt;p&gt;What: Present an ML project, discussing the motivation, problem
statement, difficulties encountered, impact, and any ancillary work. New
grads can talk about a class project; PhD grads can talk about their
research; self-learners can show off a portfolio project; industry hires
can talk about a project they worked on.&lt;/p&gt;
&lt;p&gt;Why: This gives strong signal on the candidate’s seniority level,
communication skills, and motivation for ML. It also offers a chance to
demonstrate some valuable role-specific knowledge - e.g. if you’re
hiring for a role on a recommender systems team, then the candidate that
presents a great recsys projects can have a very in-depth conversation
with the interviewers.&lt;/p&gt;
&lt;p&gt;Comments: The interviewer should approach this conversation with a
collaborative mindset, rather than a skeptical one, and focus on how the
candidate personally experienced their project, rather than on the
interviewer’s conception of how such a project should have been run.
(The latter frame of mind is a bad habit acquired from academia.)&lt;/p&gt;
&lt;h3 id="career-chat"&gt;Career Chat&lt;/h3&gt;
&lt;p&gt;What: Discuss your career arc, relevant highlights, and goals for
next role.&lt;/p&gt;
&lt;p&gt;Why: This gives signal on ambition, agency, growth potential, work
flavor preferences, personality, and figures out whether the company’s
needs match what they are looking to do next.&lt;/p&gt;
&lt;p&gt;Comments: This is a great call for the hiring manager to take. I
think this is a strict improvement on the “tell me about a time when…”
flavor of people interviews, which is susceptible to fake prepared
stories.&lt;/p&gt;
&lt;h2 id="putting-it-all-together"&gt;Putting it all together&lt;/h2&gt;
&lt;p&gt;An abbreviated loop (for startups or interns) would include 1 coding
interview, 1 data modeling interview, and a project deep dive
interview.&lt;/p&gt;
&lt;p&gt;For junior candidates, I would do 2 coding interviews, 1 coding
interview with strong math flavor / math quiz flavor, 1 data modeling
interview, and a project deep dive interview.&lt;/p&gt;
&lt;p&gt;For senior candidates, I would do 2 coding interviews, 2 data
modeling interviews, a system design interview, a project deep dive, and
a career chat with the hiring manager.&lt;/p&gt;
&lt;p&gt;For staff+ candidates, I would do a coding interview, 2 data modeling
interviews, 1 system design interview, 1 ML system design interviews, a
project deep dive, and a career chat with the hiring manager.&lt;/p&gt;
&lt;p&gt;The ML Systems Design interview has potential for very high signal,
but it needs a staff-level ML engineer to execute well. Unfortunately,
there’s a shortage of capable interviewers, given the empirical
population pyramid of the field. That’s why I only put it on the staff+
candidate loop. For what it’s worth, I think that startup founders/early
employees are qualified to run these kinds of interviews, and it might
be worth throwing them into the hiring loop for ML/AI talent.&lt;/p&gt;
&lt;p&gt;For strong candidates, there is no stronger pitch to join, than to
present a slate of talented and thoughtful interviewers who could be
their future coworkers, and an interview process rigorous enough to
assure them that all of their coworkers will have been as thoroughly
examined.&lt;/p&gt;
</summary></entry><entry><title>Next steps for Cartesian Tutor</title><link href="https://www.moderndescartes.com/essays/cartesian_tutor_turndown" rel="alternate"></link><published>2025-11-03T00:00:00Z</published><updated>2025-11-03T00:00:00Z</updated><id>tag:www.moderndescartes.com,2025-11-03:/essays/cartesian_tutor_turndown</id><summary type="html">

&lt;p&gt; Originally posted 2025-11-03&lt;/p&gt;
&lt;p&gt; Tagged: &lt;a href="/essays/tags/personal"&gt;personal&lt;/a&gt;, &lt;a href="/essays/tags/cartesian_tutor"&gt;cartesian tutor&lt;/a&gt;, &lt;a href="/essays/tags/strategy"&gt;strategy&lt;/a&gt;&lt;/p&gt;
&lt;hr /&gt;

&lt;p&gt;6 months ago, I started building Cartesian Tutor to explore AI tutors
as the future of education. It’s been an incredibly rewarding journey
and despite the lack of commercial success, I’ve learned a tremendous
amount. A++ experience, highly recommended.&lt;/p&gt;
&lt;p&gt;I’ll be job searching over the next few months. If you are looking
for an AI-pilled staff software engineer with deep expertise in ML and
LLMs who is also capable of fullstack prototyping and thinking
holistically about product/technology/strategy, let me know. My search
pipeline is currently underindexed on Boston-area / remote jobs, so
these are especially welcome. &lt;strong&gt;EDIT: my calendar is pretty full
right now, no longer looking to start new interview
processes.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I’m happy to keep Cartesian Tutor up and running for students’ sake,
and as a continuing hobby project.&lt;/p&gt;
&lt;p&gt;In this essay, I’ll do a light postmortem and talk about what went
well, what went poorly, and how I would do things differently next
time.&lt;/p&gt;
&lt;h2 id="what-i-built"&gt;What I built&lt;/h2&gt;
&lt;p&gt;Cartesian Tutor scales 1-on-1 tutoring with AI.&lt;/p&gt;
&lt;p&gt;Education is due for a change. The standard classroom model has many
weaknesses: too many students to have dedicated 1:1 time for each
student, too few students to achieve economies of scale in producing
quality content, and enough variation in ability that the entire class
moves at the pace of its 25%ile student.&lt;/p&gt;
&lt;p&gt;Textbooks and YouTube creators set the bar for scalability. 1:1
tutoring sets the bar for effectiveness. Could we have both with AI
tutors?&lt;/p&gt;
&lt;p&gt;Let’s look at why tutors are so effective. A tutor builds a mental
model of their student: thinking style, misconceptions, knowledge gaps,
working habits, and more. Using this mental model, the human tutor plans
and delivers the specific lesson that most rapidly brings the student to
mastery. Broken down as subtasks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;inspect and debug the student’s thought process by asking carefully
crafted questions&lt;/li&gt;
&lt;li&gt;design curriculum and lesson plans&lt;/li&gt;
&lt;li&gt;create, maintain, incrementally update a model of the students’
knowledge set&lt;/li&gt;
&lt;li&gt;deliver a lesson through some mix of didactic and Socratic
instruction.&lt;/li&gt;
&lt;li&gt;(for high school students and younger) interface with the parent to
align on goals and to do some light family therapy.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The Socratic method built into &lt;a href="/essays/study_mode"&gt;ChatGPT’s
Study Mode&lt;/a&gt; is but one tiny step in this direction. I found that LLMs
were not particularly good at any of the other subtasks. (That hasn’t
stopped a number of AI education companies from pushing slop generation
products to teachers…)&lt;/p&gt;
&lt;p&gt;Cartesian Tutor uses a mix of AI-delivered lessons/problem review
with traditional software and hand-curated curriculum, practice tests,
and lesson plans, and the result seems to work decently. An especially
important ingredient here is human exam-writers’ taste in writing good
questions that can’t just be solved by pattern matching/plug-and-chug.
Students’ failed attempts at solving these olympiad questions create a
trail that AI can follow to diagnose students’ weaknesses.&lt;/p&gt;
&lt;p&gt;In addition to this core product offering, I also built:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;User acquisition funnels.&lt;/li&gt;
&lt;li&gt;Content scraping/generation/management system built around a hybrid
AI/human review workflow.&lt;/li&gt;
&lt;li&gt;Engineering infra for rapid iteration + deployment of changes.&lt;/li&gt;
&lt;li&gt;Logging, billing, product analytics integrations to figure out how
users were using my product.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="why-give-up"&gt;Why give up?&lt;/h2&gt;
&lt;p&gt;When I started, I and many other people believed that there was some
nonzero chance that AI would quickly overtake many job descriptions,
empowering small teams to compete toe-to-toe with larger incumbents by
leveraging AI agents. I tried my hand at setting up an AI council to
dispense startup advice; using AI tools to generate marketing copy;
using AI tools to vibecode my frontend. None of this panned out. I
believe now that most jobs are safe, and “startup founder” the safest of
all, given how much adaptability, taste, and judgment it demands.&lt;/p&gt;
&lt;p&gt;So, what should we make of the many recent examples of highly
successful AI-centric startups, all of which have shown unprecedented
growth rates and revenue numbers? VCs would love to spin the narrative
that “AI changes everything”, but I attribute this wave of hypergrowth
startups to a different combination of factors:&lt;/p&gt;
&lt;ol type="1"&gt;
&lt;li&gt;tumult of mass layoffs pushing many potential founders to pull the
trigger on doing a startup.&lt;/li&gt;
&lt;li&gt;end of ZIRP encouraging a focus on revenue growth over headcount
growth.&lt;/li&gt;
&lt;li&gt;a flood of AI investment from VCs as well as traditional companies
&lt;em&gt;all&lt;/em&gt; throwing 1-2% of their budget at AI experiments. Some
subset of those AI experiments are now panning out, leading to a
doubling-down of investment.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;So fundamentally, I am heading back into the job market because my
assessment is that I can create (and capture) the most value by working
as an AI specialist employee at one of these highly successful
AI-centric companies, rather than as a generalist startup founder.&lt;/p&gt;
&lt;p&gt;That being said, I think it is highly likely that I will try another
startup in the future. Hence, the notes.&lt;/p&gt;
&lt;h2 id="things-i-would-change-for-my-next-startup"&gt;Things I would change
for my next startup&lt;/h2&gt;
&lt;h3 id="focus"&gt;Focus&lt;/h3&gt;
&lt;p&gt;My efforts over these past 6 months were split between:&lt;/p&gt;
&lt;ol type="1"&gt;
&lt;li&gt;building a successful business&lt;/li&gt;
&lt;li&gt;putting AI through its paces and learning its strengths and
weaknesses&lt;/li&gt;
&lt;li&gt;learning how to build a company&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;While I failed at my first goal, I made amazing progress on the
second and third goals. I don’t regret learning about AI or building
companies, but next time will be a clear focus on building a successful
business.&lt;/p&gt;
&lt;p&gt;I also spent some time consulting for another ed tech company, which
generated some revenue, helped sharpen my consulting skills, and helped
me see a very different way of working with Claude – but ultimately I
think this was just another distraction I could do without next
time.&lt;/p&gt;
&lt;h3 id="environment"&gt;Environment&lt;/h3&gt;
&lt;p&gt;I originally thought it would be a waste of money to go for a
coworking subscription when I had a nice WFH setup, but in retrospect it
would have been a good idea. My most productive times were in busy
coffeeshop environments, even if I was missing my split
keyboard/widescreen monitor.&lt;/p&gt;
&lt;p&gt;Physical health could have used more attention. My previous daily
routine included a bike ride into the office, and without that, I found
my physical fitness gradually dropping, until I suffered some sort of
acute lower back injury. I’ve recovered and am doing more yoga to help
strengthen my core.&lt;/p&gt;
&lt;h3 id="fortuitous-encounters"&gt;Fortuitous encounters&lt;/h3&gt;
&lt;p&gt;Overall, I was surprised at how often much I enjoyed random meetings
with old and new friends. Part of this was undoubtedly the social
isolation that comes with being a solo founder. I also found the
conversations helpful in refining and developing ideas for my startup.
This time, I spent ~2% of my time meeting people, and next time I think
I should spend more like 5-10% meeting people. As a Boston-based
founder, the base rate of fortuitous encounters is a lot lower, so it’s
worth being deliberate about finding and chatting with people.&lt;/p&gt;
&lt;h2 id="things-i-wouldnt-change"&gt;Things I wouldn’t change&lt;/h2&gt;
&lt;h3 id="blogging"&gt;Blogging&lt;/h3&gt;
&lt;p&gt;The &lt;a href="/essays/tags/cartesian_tutor/"&gt;weekly blogging thing&lt;/a&gt;
was honestly great. It helped me sort through the zillion thoughts that
were running through my head and set weekly goals for myself.
Community-wise, many people I chatted with had read many or all of my
updates. It sparked many good conversations, and at a time when so many
people are looking for informed takes on where AI is going, it was a
great way to build some reputational currency.&lt;/p&gt;
&lt;h3 id="flex-days-and-working-pace"&gt;Flex days and working pace&lt;/h3&gt;
&lt;p&gt;I worked roughly 30-50 hours/week during this startup period. Looking
at the pattern of my work hours, I found myself working in extremely
productive 2-4 hour bursts, followed by a few hours after to recover. I
found the following times particularly productive&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;5-8AM before the family was really up and about.&lt;/li&gt;
&lt;li&gt;10AM-1PM on weekdays, when I was fresh.&lt;/li&gt;
&lt;li&gt;4-6PM on weekdays after I had recovered from the morning
sprint.&lt;/li&gt;
&lt;li&gt;10AM-2PM on weekends if I felt the itch to keep on building.&lt;/li&gt;
&lt;li&gt;more rarely, 9-11PM, after family went to bed and I had showered and
had lots of interesting shower thoughts/ideas on what to build.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Given how little overlap there is here with a traditional 9-5 working
schedule. I’ll have to think about how I can create these optimal
conditions in my next job.&lt;/p&gt;
&lt;p&gt;I also took many opportunistic hiking trips to New Hampshire when the
weather was particularly nice. I never regretted this, as I got to spend
quality time with family and found it refreshing enough that I easily
made up the missed time on the next day.&lt;/p&gt;
&lt;p&gt;There were also many days when I found myself extremely unproductive,
or procrastinating on some specific thing that I was dreading doing. For
these days, just getting started was the most important thing, and I
often found that spark by asking Claude to do it for me. Still, many
other days went by where I actually did manage to procrastinate the
whole day. Those days, you just have to accept that there’s something
else on your mind, write off that day, and just let your mind chew on
whatever it’s chewing on.&lt;/p&gt;
&lt;h2 id="startup-learnings"&gt;Startup learnings&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Find your first customer/user, even before you have a product. It is
the business equivalent of writing your tests/specs before writing the
code. I don’t think I would use the “build vaporware marketing/landing
pages until somebody clicks through and asks where the product is”
strategy, but I have renewed appreciation for how to-the-point it
is.&lt;/li&gt;
&lt;li&gt;You really can’t underinstrument your app/website. Even something as
simple as knowing “which menu item do they click first upon landing on
your site?” is a hint at what they find most intriguing or
valuable.&lt;/li&gt;
&lt;li&gt;Relatedly - run surveys as part of onboarding. These surveys are so
important in figuring out why somebody found you and what they want out
of your product.&lt;/li&gt;
&lt;li&gt;A buffet of features is what happens when you have enough users that
you are your own distribution channel, and upselling is worthwhile.
Until you hit that point, you are searching for the one feature that
users must have, that they will abandon their existing solutions for.
Don’t cargo-cult larger companies with established product suites.&lt;/li&gt;
&lt;li&gt;Find the right mix between low-cost experimentation and polished
product development. Better to showcase your best feature than to
overwhelm with many mediocre features.&lt;/li&gt;
&lt;li&gt;Do things that don’t scale. Even if the premise of Cartesian Tutor
was “scale myself with AI”, I should have manually tutored some number
of students, just to get a visceral sense for what they are struggling
with on a day-to-day basis and what they need in terms of
software/tutoring/lesson support.&lt;/li&gt;
&lt;li&gt;When optimizing for SEO, study competitors to figure out what their
recurring new customer funnels are. For example, one competitor in
Cartesian Tutor’s space offers brand-new practice olympiad exams every
year, and this ends up as a recurring source of new students who are
looking for practice olympiads.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="acknowledgments"&gt;Acknowledgments&lt;/h2&gt;
&lt;p&gt;To my wife, for giving me the time, space, and encouragement to try a
startup.&lt;/p&gt;
&lt;p&gt;Fred, for giving me emotional permission to move on from Cartesian
Tutor.&lt;/p&gt;
&lt;p&gt;Loyal readers of my blog and mailing list, for the many great
suggestions, feedback, and connections.&lt;/p&gt;
</summary></entry><entry><title>Strategies and Tactics for working with Coding Agents</title><link href="https://www.moderndescartes.com/essays/ai_codebase" rel="alternate"></link><published>2025-10-12T00:00:00Z</published><updated>2025-10-12T00:00:00Z</updated><id>tag:www.moderndescartes.com,2025-10-12:/essays/ai_codebase</id><summary type="html">

&lt;p&gt; Originally posted 2025-10-12&lt;/p&gt;
&lt;p&gt; Tagged: &lt;a href="/essays/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="/essays/tags/software_engineering"&gt;software engineering&lt;/a&gt;, &lt;a href="/essays/tags/popular"&gt;popular ⭐️&lt;/a&gt;&lt;/p&gt;
&lt;hr /&gt;

&lt;p&gt;For the last 6 months, I’ve been building an AI-powered tutor for
teaching advanced chemistry. The codebase is roughly 98% AI-generated,
but 98% is a highly misleading number. Yes, it’s true that if you look
at the authorship by lines of code, Claude wrote 98%. But there has been
so much human intervention that it would be more accurate to say that my
codebase is 250% AI-generated, with my contribution totalling
+2%/-152%.&lt;/p&gt;
&lt;p&gt;Here are some of the ways in which simple vibecoding has failed me,
and how I’ve coevolved with AI coding assistants over the last six
months.&lt;/p&gt;
&lt;h2 id="strategy"&gt;Strategy&lt;/h2&gt;
&lt;h3 id="information-architecture-should-be-handwritten"&gt;Information
Architecture should be handwritten&lt;/h3&gt;
&lt;p&gt;Every single “no-code platform for non-coders” has run into this
issue sooner or later: the person using the platform doesn’t actually
know how to communicate what they want to have built. (Let’s pretend
it’s a communication issue.)&lt;/p&gt;
&lt;p&gt;The answer to “what do you want to build?” is the information
architecture (IA), also called the data model. From this IA, everything
else flows. IA constitutes the following things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;What are the objects?&lt;/li&gt;
&lt;li&gt;What actions can be taken on objects?&lt;/li&gt;
&lt;li&gt;Who “owns” an object? (both in the sense of who is authorized to
take various actions, as well as any parent entities whose lifecycle is
linked to this object.)&lt;/li&gt;
&lt;li&gt;What uniquely identifies an object?&lt;/li&gt;
&lt;li&gt;Where is the source of truth / what is merely a clone or derived
value?&lt;/li&gt;
&lt;li&gt;What type of relationship (&lt;code&gt;1:1, 1:n, n:m&lt;/code&gt;) does this
object have with other objects?&lt;/li&gt;
&lt;li&gt;What is the lifecycle of this object? Who/what triggers creation,
what operations happen during its lifetime, is the object ever
considered “complete/dead/deleted”?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The reason I say these should be handwritten is not because AI is
incapable of thinking about these things. It can, in fact, contribute to
the development of the IA, and if you can answer the above questions in
plain English, AI can even translate this into the appropriate database
models/APIs/etc.. But in the end, you must have answered the above
questions, because the IA is the nucleus of the software, the DNA from
which the rest of the app derives.&lt;/p&gt;
&lt;p&gt;In practice, “handwriting an IA” to me, means sitting down and
writing the Pydantic/SQLAlchemy models by hand. It usually takes just
10-50 lines of code, and this is precisely the 2% of my codebase that is
handwritten.&lt;/p&gt;
&lt;p&gt;If you do not design your IA, AI will intentionlessly design it for
you.&lt;/p&gt;
&lt;p&gt;See &lt;a
href="https://notes.mtb.xyz/p/your-data-model-is-your-destiny"&gt;Your Data
Model is Your Destiny&lt;/a&gt; for a great compilation of the ways in which
IA are fundamental to a product.&lt;/p&gt;
&lt;h3 id="useless-features-should-be-removed"&gt;Useless features should be
removed&lt;/h3&gt;
&lt;p&gt;When AI-coding, you should take &lt;a
href="https://martinfowler.com/bliki/Yagni.html"&gt;YAGNI&lt;/a&gt; to its
extreme. This is for two reasons:&lt;/p&gt;
&lt;ol type="1"&gt;
&lt;li&gt;AI coding makes it absolutely trivial to add new features later on
if you do need it.&lt;/li&gt;
&lt;li&gt;Useless features propagate like a cancer. It’s far easier to remove
them before they metastasize.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Here’s an illustration of how useless features can spread.&lt;/p&gt;
&lt;p&gt;I wanted to build a set of chemistry lectures, and I had the not so
brilliant idea that I’d just take the AP Chemistry Teacher’s Manual and
transform this PDF into a curriculum via AI. I transcribed and
transformed the entire PDF, and the result was a complete mess. The
Teacher’s Manual was full of useless text that I will generously
interpret as “bureaucratic ass-covering”. For example, each lesson
teaches a “Skill”, like
&lt;code&gt;Mathematical Routines (5.D) - Identify information presented graphically to solve a problem.&lt;/code&gt;.
Very important skill. Later, when I asked Claude to write practice
problems for each lesson, it would try to incorporate these useless
annotations. The generated practice problems would contain problems
like, “Explain how you interpreted this diagram graphically to answer
part (a)”. Eventually, I realized that it was far easier to handwrite
every single lesson plan in a roughly 50-word sketch - and from this,
everything flowed so much more smoothly. Less was more.&lt;/p&gt;
&lt;p&gt;A similar thing happens in your codebase. This very same curriculum
data has at least six representations in my codebase:&lt;/p&gt;
&lt;ol type="1"&gt;
&lt;li&gt;raw YAML files. (This is what gets checked into source
control.)&lt;/li&gt;
&lt;li&gt;publication API/client/script so that I can push curriculum updates
to the live site.&lt;/li&gt;
&lt;li&gt;postgres table, with various foreign keys and indices.&lt;/li&gt;
&lt;li&gt;templated into an AI’s system prompt when a new lesson is
started.&lt;/li&gt;
&lt;li&gt;API on-the-wire representation that the frontend uses to query
available lessons and to start/interact with lessons.&lt;/li&gt;
&lt;li&gt;Frontend representation, to display various metadata to the user
about which lesson they’re currently taking.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This is just the straight-line from the raw data that I hand-edit, to
the lesson that users interact with. It does not include,
e.g. cross-links between curriculum and test questions, or the MCP
server I built to let Claude manipulate the curriculum.&lt;/p&gt;
&lt;p&gt;Every useless feature is amplified by its many representations
throughout the system, bloating the context window and distracting the
AI. The AI has no awareness on which of these features is actually
important, and so it will default to building a solution that ensures
everything is handled and passed through. Sometimes, the presence of one
useless feature induces the AI to build another useless feature, which
itself becomes useless fluff that amplifies throughout your
codebase.&lt;/p&gt;
&lt;p&gt;Fluff is less important to trim on the frontend, because it is at the
very tip of this amplification chain. On the other hand, useless
features in the core data models &lt;strong&gt;must&lt;/strong&gt; be trimmed. It is
so much easier to trim these features before they grow tendrils into
other parts of your codebase.&lt;/p&gt;
&lt;h3 id="consistent-naming-is-important"&gt;Consistent naming is
important&lt;/h3&gt;
&lt;p&gt;One of the big breakthroughs this year in coding assistants is tool
usage, and specifically, &lt;code&gt;grep&lt;/code&gt;. I think it is quite
reasonable to say that Claude Code’s ability to productively use
&lt;code&gt;grep&lt;/code&gt; was what let Claude vault across the moat of Cursor’s
extensive investments into codebase indexing and RAG patterns. I
remember seeing Cursor’s increasingly panicked emails to me after I’d
canceled my Cursor subscription for a Claude one – all because Claude
knew how to use grep.&lt;/p&gt;
&lt;p&gt;Your codebase will be most greppable when each concept has a
distinct, consistently used, greppable name. Then, when you ask Claude,
“hey can you add X feature to Y system”, Claude will grep for Y, and
immediately come up with a giant list of files that must be touched in
order to build top-to-bottom support for your new feature X. Yes, you
could curate a CLAUDE.md file that lays out all this system
architecture, but why not just build it into your codebase? This tip is
related to the “think about your IA” tip.&lt;/p&gt;
&lt;h3 id="set-up-frameworks-for-success."&gt;Set up frameworks for
success.&lt;/h3&gt;
&lt;p&gt;AI is very good at following existing patterns in your codebase.&lt;/p&gt;
&lt;p&gt;I’ve been using Svelte 5/Sveltekit for my startup’s frontend code.
The first month was painful, because neither I nor the LLM knew how to
write Svelte 5 code. Everyone else was talking about how React/Next
would become the LLM frontend dialect/framework of choice, and I
wondered if I should switch over. But over the next 2 months, I learned
frontend; I thought about how I wanted my frontend code structured, and
spent a lot of time figuring out how to handwrite a few components/data
stores in the way I wanted to organize the codebase. Claude can then
look at these files for inspiration/templates for how to design new
components and data stores. (I also rely heavily on OpenAPI codegen for
client code, and Claude also knows this.)&lt;/p&gt;
&lt;p&gt;Since then, I have basically vibecoded every single frontend feature,
and I really do mean vibecode – I barely glance at the svelte code
before checking it in and deploying it. Every so often I double-check
the code to make sure it looks roughly correct, and other than the &lt;a
href="https://x.com/karpathy/status/1976077806443569355"&gt;unnecessary
try-catch blocks that LLMs can’t stop writing&lt;/a&gt;, the AI basically
wrote the code that I would have written myself.&lt;/p&gt;
&lt;p&gt;Now, this has led to a number of hilarious failures, which I have
progressively introduced more structure/frameworks to fix.&lt;/p&gt;
&lt;p&gt;For example, I noticed one day that the specific shade of blue wasn’t
quite right on one page. I realized then that Claude had been vibing new
RGB hex codes every time it needed to style a component, and that they
were all slightly different. So I then had to introduce color variables
and create a color scheme. Another time, I noticed that all of my pages
had slightly different widths – 1200px, 1000px, 960px, 1024px, etc. Same
issue. I believe I’ll have to go through at some point and convert my
entire site to using Tailwind CSS, and actually learn how div nesting
works in CSS. A third related issue which I haven’t fixed yet: click
areas vary because Claude sometimes uses flexbox / gap to space
elements, and othertimes margin/padding, and no two pages really have
consistent spacing. I’m sure that there are more issues that I haven’t
noticed yet, as a frontend noob.&lt;/p&gt;
&lt;h2 id="tactics"&gt;Tactics&lt;/h2&gt;
&lt;h3 id="lean-on-ai-to-do-integrations"&gt;Lean on AI to do
integrations&lt;/h3&gt;
&lt;p&gt;One area in which I’ve found coding agents an absolute godsend is in
third-party integrations. I really, really, do not want to learn how
each vendor’s API works - I want a simple module that wraps and
quarantines each vendor’s nonsense, presenting a simple interface that
exposes just the one or two things that I need that vendor to actually
do. Then, there’s all of the one-off setup / installation nonsense that
needs to happen for each vendor. I have found that Claude is shockingly
good at navigating these vendor integrations. Any time I touched GCP,
for example, would inevitably have been an hours-long slog of figuring
out which IAMs I have to grant myself/my service account, figure out why
some bucket was misconfigured, what the &lt;em&gt;names&lt;/em&gt; of the relevant
IAMs even are, etc. etc.. Now I just tell Claude what outcome I want,
and it turns out to basically know the &lt;code&gt;gcloud&lt;/code&gt; CLI by heart.
If it doesn’t know the CLI invocations, it can do the web research to
look up the right documentation.&lt;/p&gt;
&lt;h3 id="run-all-one-off-setup-through-claude"&gt;Run all one-off setup
through Claude&lt;/h3&gt;
&lt;p&gt;A related tip is to run all one-off setup through Claude. I actually
have stopped using commands like &lt;code&gt;uv add X&lt;/code&gt;, in favor of
asking Claude, “can you install X library”. Claude &lt;em&gt;usually&lt;/em&gt; runs
&lt;code&gt;uv add X&lt;/code&gt;, but there are a variety of instances where the
name of the python import does not line up with the package name, or
where there is some &lt;code&gt;[option]&lt;/code&gt; that I’m supposed to specify
when installing the package. Same goes for frontend libraries. I
&lt;em&gt;especially&lt;/em&gt; do this when I have to install a ruby-based tool,
and I have to invoke some incantation of rbenv and ruby and gem and
bundler, none of which are tools I know much about. Claude knows about
all of these tools. It roughly knows “best practices” for using
environment/package managers, and can grok error messages and hammer
away until the installation works. I am 100% happy to let Claude do this
work.&lt;/p&gt;
&lt;p&gt;This probably horrifies some subset of you, especially those of you
who actually know how package managers work and/or work in security. But
the truth is that figuring out installation/dependency hell is by far my
least favorite part of coding. I used to volunteer to help teach Python
to beginners, but you know what? 99% of my time was actually spent
unfucking people’s Python installations, not actually answering Python
questions. I assure you that for the vast majority of
people/scenarios/use cases, Claude is far more likely to install
something correctly, with the right level of sudo or global installation
or system, env manager, package manager, etc. than someone who is trying
to do it manually without knowing what they’re doing.&lt;/p&gt;
&lt;h3 id="help-build-the-debug-tooling"&gt;Help build the debug tooling&lt;/h3&gt;
&lt;p&gt;Many times, I’ve thought – “if only I could just edit the database
into the appropriate test configuration, this would be so much easier”.
Usually, this simply involves firing up pgAdmin or some other GUI-based
database interface. But Claude can do it better.&lt;/p&gt;
&lt;p&gt;In one scenario involving my consulting client, it was a mysql
database running in docker, with a complicated database+table+foreign
key structure that I didn’t care to understand deeply. I didn’t know
mySQL syntax, nor did I have a mysql GUI installed, nor did I want to
learn or install any new tools. The goal was to find a lesson that had a
quiz that contained a problem that had a certain type of LaTeX syntax to
be present, in addition to having multiple choice options that were
themselves image attachments, to see if my prompt templating was
handling it all correctly in the live app.&lt;/p&gt;
&lt;p&gt;I started by setting up Claude with a bash utility:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt; can you compose a single bash command that will create a docker bash shell, start a mysql 
shell, and run SQL? I&amp;#39;m currently using `docker-compose exec db sh` and then mysql 
--user=root --password=root 

⏺ docker-compose exec db mysql --user=root --password=root -e &amp;quot;DESCRIBE table_name;&amp;quot;

Or for an interactive mysql shell:

docker-compose exec db mysql --user=root --password=root

Or to run multiple SQL statements:

docker-compose exec db mysql --user=root --password=root &amp;lt;&amp;lt;EOF
DESCRIBE table_name;
SELECT * FROM another_table LIMIT 5;
EOF

&amp;gt; /clear 
⎿  (no content)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then, I cleared the context and asked Claude to 1. find an example of
a problem that had an attachment as well as the appropriate LaTeX syntax
in it, using ILIKE “%%” to search the problem.prompt_text field and then
2. figure out the lesson module URL where I could see this problem in
the app. I passed in the above instructions on how to execute arbitrary
SQL in my specific configuration.&lt;/p&gt;
&lt;p&gt;Claude then utilized a combination of DESCRIBE sql queries and
grepping through the codebase to guide its search and understanding of
the schema, and then finally reported that while there were problems
with the right LaTeX syntax and problems with image attachments, there
were not problems with both. So I then asked Claude to finagle the
attachment_id foreign keys to construct the desired example – which it
was able to do cleanly. Finally, I was able to load up this mutant
example in my dev instance, and then iterate on my code until it worked
properly.&lt;/p&gt;
&lt;p&gt;P.S. Despite Claude’s wizardry, I was watching like a hawk while
Claude did its thing. I still don’t trust it not to reset my
database.&lt;/p&gt;
&lt;h3 id="llms-are-blind"&gt;LLMs are blind&lt;/h3&gt;
&lt;p&gt;I elaborate on this point in &lt;a href="/essays/blind_llms/"&gt;Multimodal
LLMs are Blind&lt;/a&gt;, but the TL;DR is: LLMs are currently actually blind,
and they present the illusion of having vision capabilities by using
what are essentially screenreaders / captioning tools.&lt;/p&gt;
&lt;p&gt;Do &lt;em&gt;not&lt;/em&gt; expect LLMs to be able to do more than copy/figure
out the rough page structure based on a screenshot.&lt;/p&gt;
&lt;p&gt;A half-fix for this issue is Playwright MCP, which lets the LLM
interact directly with the compiled/rendered version of an app at the
browser engine level. Through Playwright, an LLM can precisely grab
color hex codes or CSS properties, or copy div nesting structure. It can
even sort of understand global page layout through the screenshot API.
But because LLMs are currently blind, you will not be able to use LLMs
to fine-tune visual alignment and other types of size/shape
matching.&lt;/p&gt;
&lt;p&gt;In between Figma/Adobe working on this problem, and improvements in
the base LLMs, I do expect this problem to go away on the 1-2 year
timescale.&lt;/p&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Overall, working with an AI coding assistant is much like being the
tech lead for a project. Many of my tips would be equally applicable
five or ten years ago, with junior engineers as the beneficiaries,
rather than coding assistants. Today, I think the big difference between
juniors and coding assistants is the ability to introspect about “why
does this codebase feel so frustrating to code in?”, and to figure out
how to improve the situation. In part, it’s this introspection process
that produces senior engineers. We’ll see whether LLMs can learn to
develop &lt;a href="/essays/taste/"&gt;taste&lt;/a&gt; - that’s when humans will
really be in trouble!&lt;/p&gt;
</summary></entry><entry><title>Startup update 20: LLM Zombies</title><link href="https://www.moderndescartes.com/essays/9_24_2025" rel="alternate"></link><published>2025-09-24T00:00:00Z</published><updated>2025-09-24T00:00:00Z</updated><id>tag:www.moderndescartes.com,2025-09-24:/essays/9_24_2025</id><summary type="html">

&lt;p&gt; Originally posted 2025-09-24&lt;/p&gt;
&lt;p&gt; Tagged: &lt;a href="/essays/tags/cartesian_tutor"&gt;cartesian tutor&lt;/a&gt;, &lt;a href="/essays/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;
&lt;hr /&gt;

&lt;h2 id="progress-update"&gt;Progress update&lt;/h2&gt;
&lt;p&gt;This week, I worked on building a “Review problem with AI” function
that would import any question on my website into an AI for an AI-guided
interactive solve experience. I originally thought that this would be
just a convenience feature, so that students wouldn’t have to go figure
out how to copy-paste the problem (and all associated images) into
ChatGPT if they wanted a solution.&lt;/p&gt;
&lt;p&gt;Then, I tried using it, and found that Claude Sonnet 4 would simply
get the answer wrong about 5% of the time, and would generate bullshit
explanations about 30% of the time. These are relatively subtle
mistakes, too - many chemistry majors and even PhDs would likely make
similar mistakes! I tried GPT-5 pro and Opus 4.1 but none were really
satisfactory. The current best recourse is to &lt;em&gt;manually&lt;/em&gt; review
the AI-generated solution at a great time expense. I’ll continue
exploring because it is not sustainable for me to spend ~ 4 hours
reviewing/correcting solutions for &lt;em&gt;each&lt;/em&gt; exam. (It’s not easy
work, either - those 4 hours sap my mental energy enough to make the
rest of my day zero-productivity.)&lt;/p&gt;
&lt;p&gt;Props to the USNCO problem authors for so reliably coming up with
LLM-fooling problems.&lt;/p&gt;
&lt;p&gt;On the plus side, all this labor so far has generated a plethora of
“look how LLMs screw this up” fodder for a series of blog posts /
marketing materials that might drive more students to my site. TBD
whether I have to learn how to use TikTok or whether I’ll find a
different platform to market on.&lt;/p&gt;
&lt;p&gt;It is slightly surreal to me that I am seeing this rate of bullshit
in cutting-edge foundation models at the same time that these companies
are raking in gold medals at the IMO/IOI. There is definitely a
disconnect here. Maybe what’s going on is that the IMO/IOI models are
massively parallelized, with thousands of extended reasoning attempts,
and tiered models consuming each other’s output to try and distill and
adjucate between multiple attempts of unknown correctness into a final
solution. It cannot possibly be the same models available to mere
mortals?&lt;/p&gt;
&lt;p&gt;Anyway, if you work at a major AI training company and want to buy
some high-quality solution tokens, let me know at
&lt;a href="mailto:brian@cartesiantutor.com"&gt;brian@cartesiantutor.com&lt;/a&gt;.
See my new &lt;a
href="https://www.cartesiantutor.com/blog/posts/native_metals/"&gt;Cartesian
Tutor blog&lt;/a&gt; for an example of what these solution tokens might look
like.&lt;/p&gt;
&lt;h2 id="llm-zombies"&gt;LLM Zombies&lt;/h2&gt;
&lt;p&gt;If I had to broadly describe the errors I’m seeing, I would simply
say: LLMs are pattern-matching zombies, without any internal world
model.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;~50% of errors were due to misreading images and diagrams - in line
with my observations at &lt;a href="/essays/blind_llms"&gt;LLMs are Blind&lt;/a&gt;.
&lt;ul&gt;
&lt;li&gt;LLMs misidentify glassware types (e.g. they mix up burette,
condenser, and addition funnel, which all look like upright cylinders);
I could see them not making this mistake next year.&lt;/li&gt;
&lt;li&gt;LLMs jump to conclusions about what they’re seeing; in this problem,
they make assorted mistakes, like assuming the furanose form of glucose
in option (D) is fructose, or somehow thinking they’re looking at a
disaccharide. This is probably related to the contextless image encoding
architecture.&lt;/li&gt;
&lt;/ul&gt;
&lt;img src="/static/usnco_2025_problem_60.png" /&gt;
&lt;ul&gt;
&lt;li&gt;LLMs also fail to “see” 2D renderings of molecules in 3D space,
which is a problem I don’t expect to be solved in foundation models for
at least 5-10 years.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;When I retried some of these problems with GPT-5, Claude Opus 4.1,
Gemini 2.5 Pro, they struggled with many of these questions. For
example: when asked whether a reaction had a high entropy change, Opus
4.1 would claim that &lt;span class="math inline"&gt;\(\ce{C60 (buckyball)
-&amp;gt; 60C (graphite)}\)&lt;/span&gt; generated 59 net particles, so therefore
it was high entropy. However, each sheet of graphene binds together far
more than 60 carbon atoms, on average, so the net number of particles
would go down. GPT-5 and Gemini-2.5 Pro did not make this particular
mistake but made others.&lt;/li&gt;
&lt;li&gt;I also saw mistakes where an LLM would claim that a factor was
negligible because it was so small. That might be true if it was
simplifying an addition &lt;span class="math inline"&gt;\(x + \epsilon \approx
x\)&lt;/span&gt;, but here, it was actually simplifying a multiplication &lt;span
class="math inline"&gt;\(x\epsilon \approx x\)&lt;/span&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Many of these issues are just… really, completely inexcusable
mistakes, from a human point of view. Well, at least there’s a business
for me to build.&lt;/p&gt;
</summary></entry><entry><title>Startup update 19: Email scraping; Taste</title><link href="https://www.moderndescartes.com/essays/9_16_2025" rel="alternate"></link><published>2025-09-16T00:00:00Z</published><updated>2025-09-16T00:00:00Z</updated><id>tag:www.moderndescartes.com,2025-09-16:/essays/9_16_2025</id><summary type="html">

&lt;p&gt; Originally posted 2025-09-16&lt;/p&gt;
&lt;p&gt; Tagged: &lt;a href="/essays/tags/cartesian_tutor"&gt;cartesian tutor&lt;/a&gt;&lt;/p&gt;
&lt;hr /&gt;

&lt;h2 id="progress-update"&gt;Progress update&lt;/h2&gt;
&lt;p&gt;Not much of note, website feature-wise. I finished my Stripe
integration! (insert Scrooge duck diving into bathtub of gold meme.)&lt;/p&gt;
&lt;p&gt;On the marketing front, I set up some semi-automated email scraping,
where I looked into past winners of the USNCO, identified the schools
they came from, figured out which chemistry teachers coached/advised
them, and then looked up the teachers’ email addresses. The workflow was
centralized around a single CSV file, to which I requested Claude do
various lookups and to add in data. Unfortunately, Claude completely
failed at reading some of the USNCO results PDFs, so I had to ask Gemini
to transcribe them, and then dump the text into Claude to have it
transfer it into the CSV. Claude also completely failed at web scraping
emails, so I set up an MCP server wired up to Perplexity so that it
would delegate its email scraping to Perplexity. So in the end, my
system was a Frankenstein of Claude as Orchestrator, with Gemini as
specialized image-transcriber and Perplexity as specialized web search
agent. This system got me ~100 teacher emails, to which I will slowly
trickle out some cold emails over the next week or two.&lt;/p&gt;
&lt;p&gt;The email scraper I built might actually make for a decent
single-purpose website that accepts CSV/excel/text files and for-loops
over them to scrape emails for you. I’d guess it would take a week or
two to set up properly, and would be an interesting exercise in
configuring a production Claude orchestration agent. It would probably
also make more money than the chemistry tutoring thing.&lt;/p&gt;
&lt;h2 id="taste"&gt;Taste&lt;/h2&gt;
&lt;p&gt;I wrote &lt;a href="/essays/taste"&gt;Taste&lt;/a&gt; to explore some of the
swirling ideas in my head. Many ideas didn’t solidify in time for the
Taste essay. These are my answer to the &lt;a
href="/essays/taste/#conclusion"&gt;concluding question of that
essay&lt;/a&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When is a taste-based product or service profitable? Is profit
necessarily in opposition to taste? What systems of governance align
these two incentives?&lt;/li&gt;
&lt;li&gt;What is the nature of the interaction between community, channels,
and platforms? Why have Bluesky, Lobste.rs, Nebula not really take off?
Why is Substack seemingly succeeding (and how did Medium screw it up so
badly)? Is Youtube’s advantage in its creators, rather than its
algorithm?&lt;/li&gt;
&lt;li&gt;Are there algorithmic solutions (a la Pagerank) that assign “taste”
scores to users as well as content? Can these algorithmic solutions
bootstrap good taste measurement abilities, or do low-taste equilibria
dominate? Perhaps early Google was successful precisely because PageRank
distilled the good taste of a very small, elite group of folks that
populated the early Internet, but everything’s been downhill since then,
given the parasitic pressure of SEO optimizers and eternal September of
tasteless new people on the internet.&lt;/li&gt;
&lt;li&gt;What is the process by which low-taste communities slowly bootstrap
themselves into a high-taste community? How does the pace of change, and
parasitic load of grifters affect how and whether this transition ever
happens?&lt;/li&gt;
&lt;/ul&gt;
</summary></entry><entry><title>Taste</title><link href="https://www.moderndescartes.com/essays/taste" rel="alternate"></link><published>2025-09-12T00:00:00Z</published><updated>2025-09-12T00:00:00Z</updated><id>tag:www.moderndescartes.com,2025-09-12:/essays/taste</id><summary type="html">

&lt;p&gt; Originally posted 2025-09-12&lt;/p&gt;
&lt;p&gt; Tagged: &lt;a href="/essays/tags/personal"&gt;personal&lt;/a&gt;, &lt;a href="/essays/tags/popular"&gt;popular ⭐️&lt;/a&gt;&lt;/p&gt;
&lt;hr /&gt;

&lt;p&gt;A meditation on humanity’s last edge over AI.&lt;/p&gt;
&lt;h2 id="taste-is-art"&gt;Taste is art&lt;/h2&gt;
&lt;p&gt;Literally interpreted, taste is one of our five senses.&lt;/p&gt;
&lt;p&gt;Taste is multidimensional: salty, sweet, savory, sour, fatty,
mouthfeel, trigeminal, aroma.&lt;/p&gt;
&lt;p&gt;Taste is balance within dimensions: not too much nor too little.&lt;/p&gt;
&lt;p&gt;Taste is balance between dimensions: sweet paired with sour, fat with
acidity.&lt;/p&gt;
&lt;p&gt;Taste is necessarily subjective, as a visceral experience, yet also
objective, as people can generally agree which of two comparable dishes
is tastier.&lt;/p&gt;
&lt;p&gt;Taste also extends to the other domains. Fine fragrance, painting,
sculpture, architecture, music, poetry, calligraphy, and Go all share
the multidimensional, balanced, subjective, and objective aspects of
taste.&lt;/p&gt;
&lt;p&gt;We say someone has taste when they are capable of discerning options
that are more tasteful than others. All other things being equal, people
prefer more tasteful options to less tasteful options.&lt;/p&gt;
&lt;h2 id="taste-is-parsimony"&gt;Taste is parsimony&lt;/h2&gt;
&lt;p&gt;Claude’s coding output is functional, but hardly tasteful. It’s full
of unnecessary comments, unnecessary features, unnecessary layers of
abstraction, lacks organizing philosophy, and generates rampant
conceptual debt.&lt;/p&gt;
&lt;p&gt;Within mathematics, mathematicians share an understanding of &lt;a
href="https://en.wikipedia.org/wiki/Paul_Erd%C5%91s#:~:text=The%20Book"&gt;The
Book&lt;/a&gt;: an imaginary collection of the simplest, most elegant, most
beautiful, most accessible, most tasteful proofs of every theorem.&lt;/p&gt;
&lt;p&gt;Within engineering, engineers share an understanding of tasteful
engineering: systems that fulfill their stated goals, but exceed
expectations on resource efficiency, scalability, expediency, longevity,
versatility, or maintainability, due to a tasteful simplicity in
design.&lt;/p&gt;
&lt;p&gt;These are just some of the fields I’m most familiar with.&lt;/p&gt;
&lt;p&gt;Machine Learning brings us the concept of &lt;em&gt;regularization&lt;/em&gt; - a
penalty applied to all solutions according to their complexity, allowing
us to tiebreak between otherwise equivalent solutions by choosing the
simpler one. (Unfortunately, machine learning fails to deliver any
deeper insight into “complexity” - “bigger numbers are worse” is the
best we can come up with.)&lt;/p&gt;
&lt;p&gt;Taste is the regularization function of human endeavor.&lt;/p&gt;
&lt;h2 id="taste-is-the-field-behind-the-goalposts"&gt;Taste is the field
behind the goalposts&lt;/h2&gt;
&lt;p&gt;Effectiveness is complementary to taste.&lt;/p&gt;
&lt;p&gt;Options can be measurably compared among various axes. Once you
subtract these axes - taste is what remains.&lt;/p&gt;
&lt;p&gt;The arts are unique in that there are virtually no measurable axes;
they are &lt;em&gt;entirely&lt;/em&gt; taste-driven.&lt;/p&gt;
&lt;p&gt;One fascinating case study is Go. Go is a game with a very simple
measuring stick for effectiveness: the player with more board area under
control wins the game. Yet, Go is simultaneously a game with no
discernible rules for good play. Go’s tastefulness is thus an emergent
property of its strategic depth; the pursuit of taste in Go is the
pursuit of victory. The measurable superiority of AlphaGo Zero has, in
my opinion, destroyed much of what used to be tasteful about Go. I
believe Lee Sedol’s rationale for retiring from Go is exactly this
sentiment.&lt;/p&gt;
&lt;p&gt;This brings us to my core hypothesis: &lt;strong&gt;Taste is the field
behind the goalposts, into which the goalposts are constantly being
shifted.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Humanity’s edge over AI is that we are constantly expanding the field
behind the goalposts.&lt;/p&gt;
&lt;p&gt;In the second half of this essay, I want to discuss how humans create
taste.&lt;/p&gt;
&lt;h2 id="taste-is-cultural-distillation"&gt;Taste is cultural
distillation&lt;/h2&gt;
&lt;p&gt;To develop taste, it helps to have a community of practitioners who
are united in purpose. Community accelerates the development of taste by
showcasing examples of good taste, by furnishing peers who can give
targeted feedback, and by distilling a raw stream of content into a
curated trickle of exemplars.&lt;/p&gt;
&lt;p&gt;I’m a mostly self-taught software engineer and ML researcher, but I
couldn’t have pulled it off without community:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I learned from the best resources, as per community consensus: &lt;a
href="https://en.wikipedia.org/wiki/Structure_and_Interpretation_of_Computer_Programs"&gt;SICP&lt;/a&gt;,
&lt;a href="https://gameprogrammingpatterns.com/"&gt;Game Programming
Patterns&lt;/a&gt;, &lt;a href="https://docs.stripe.com/api"&gt;Stripe’s API&lt;/a&gt;,
Linear Algebra Done Right, &lt;a href="https://karpathy.ai/"&gt;Andrej
Karpathy&lt;/a&gt;, &lt;a
href="http://neuralnetworksanddeeplearning.com/"&gt;Nielsen&lt;/a&gt;, and
more.&lt;/li&gt;
&lt;li&gt;I received and gave mentorship through Google’s &lt;a
href="/essays/readability"&gt;Readability&lt;/a&gt; program.&lt;/li&gt;
&lt;li&gt;I’m a longtime follower of &lt;a
href="https://news.ycombinator.com"&gt;Hacker News&lt;/a&gt;, a community whose
collective taste creates a reliable filtering mechanism for content of
interest, and a forum to debate the merits of each post with subject
matter experts.&lt;/li&gt;
&lt;li&gt;The &lt;a href="http://recurse.com/"&gt;Recurse Center&lt;/a&gt; is a community
of curious programmers pushing their abilities.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;What are the mechanisms by which a community distills good taste from
the collective efforts of its members?&lt;/p&gt;
&lt;p&gt;Today, we are perhaps most familiar with The Algorithm: a class of
solutions that rely on making statistical inferences over user
interaction data, like upvotes/downvotes, dwell time, watch time,
clickthrough rates, links, shares, retweets, and more. These solutions
have the advantage of scalability – there’s far too much video on
YouTube for anyone to watch it all – but overall, tend to promote
tasteless, lowest-common-denominator content.&lt;/p&gt;
&lt;p&gt;Of all of the mainstream content platforms, I think YouTube comes the
closest to algorithmically surfacing good content. Yet I find that it
&lt;em&gt;wants&lt;/em&gt; to veer into the tasteless, and it’s only through my
application of taste in downvoting tasteless content that my YouTube
recommendation feed stays clean.&lt;/p&gt;
&lt;p&gt;Overwhelmingly, I prefer curated channels over The Algorithm.
Channels – say, a newspaper, journal, magazine, blog, YouTube channel,
newsletter, mailing list, or even just shelf space in a retail
storefront – allow a curator to elevate content that they deem
tasteful.&lt;/p&gt;
&lt;p&gt;The interactions between its members’ taste, curators’ taste, and
reputational effects are what drive communities forward. Members choose
which curators to trust, and curators themselves are members of the
community; if a niche channel is trusted by other curators, then that
channel has influence far beyond its raw readership.&lt;/p&gt;
&lt;p&gt;Who and what determines whether a curator is trusted? Unfortunately,
taste is a self-referential problem: you need taste to pick the right
curators to trust. Curators are also not infallible. One failure mode is
“selling out” – trading your readers’ accumulated trust for money.
Another failure mode is simply not keeping ahead of the curve, as the
community collectively uplevels their taste.&lt;/p&gt;
&lt;p&gt;It’s possible for communities to exist in high-taste and low-taste
equilibrium states - a high-taste community is one in which tasteful
curators are identified and elevated, and a low-taste community is one
in which sellouts prosper because nobody has enough taste to tell the
difference. The replacement of traditional curation with algorithmic
distillation is likely responsible for pushing many online and even
offline communities into low-taste equilibrium states.&lt;/p&gt;
&lt;h2 id="taste-is-governance"&gt;Taste is governance&lt;/h2&gt;
&lt;p&gt;The collective taste of humans shapes our environment by determining
our leaders, our policies, our priorities, the set of goods and services
available for purchase.&lt;/p&gt;
&lt;p&gt;What are democratic elections, but a measurement of the taste of the
population? What is the free market, but a measurement of consumer
taste? What is a conclave, but a measurement of Catholic cardinals’
taste? What is royal succession, but a measurement of the current King’s
taste?&lt;/p&gt;
&lt;p&gt;Any attempt to measure collective taste runs afoul of politics. Whose
taste should be considered? Is it weighted, e.g. by headcount
(democracy), by money (capitalism), by insider status (elitism), by
bloodline (monarchy), by number of accounts (The Algorithm)? (&lt;a
href="https://en.wikipedia.org/wiki/The_Tyranny_of_Structurelessness"&gt;Structurelessness
is also a type of governance&lt;/a&gt;). What incentives are there to express
ones’ thoughtful opinion? What incentives are there to prefer the
collective good over personal greed?&lt;/p&gt;
&lt;p&gt;Different systems of governance try to solve these problems, and all
have their problems. These systems exist at all scales, from your local
franchise restaurant, to entire institutions (e.g. “modern art” or
“academia”), to countries. Over time, systems of governance, and their
particular incarnations, prove their ability or inability to continually
elevate curators with good taste. Unfortunately, these dynamics play out
over the timescales of decades to millenia, making it hard to judge
which systems are the best.&lt;/p&gt;
&lt;p&gt;I see competitive destruction – whether it be market competition,
war, or immigration – as the evolutionary force that selects for the
best governance.&lt;/p&gt;
&lt;h2 id="refining-your-taste"&gt;Refining your taste&lt;/h2&gt;
&lt;p&gt;As individuals, how can we refine our tastes? Some general
advice:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Great artists have great taste. Being able to recognize good work is
a prerequisite to producing it.&lt;/li&gt;
&lt;li&gt;You grow your taste by actively analyzing &lt;em&gt;why&lt;/em&gt; one option is
more tasteful than another.&lt;/li&gt;
&lt;li&gt;You develop taste by exposing yourself to more tasteful content.
Finding a community is a fast-track to finding tasteful content.&lt;/li&gt;
&lt;li&gt;You also develop taste by creating. Creation highlights the inherent
limitations, constraints, and difficulties of the medium.&lt;/li&gt;
&lt;li&gt;“What if” exercises – where you intentionally drop one ingredient
and see what happens – are a great way to understand &lt;em&gt;why&lt;/em&gt;
something is necessary, if it is necessary at all!&lt;/li&gt;
&lt;li&gt;Always be looking for more tasteful communities. The beginner
community is rarely the same as the advanced community, and both have a
place in your journey.&lt;/li&gt;
&lt;li&gt;Not all “advanced” communities have taste. You need taste to judge
taste. (e.g. in software, many communities revolve around one flavor or
another of pedantry, which is not equivalent to taste.)&lt;/li&gt;
&lt;li&gt;Taste is contextual. Yet, someone with taste in one domain can
weakly judge taste in a different domain. This video of &lt;a
href="https://www.youtube.com/watch?v=99oj1r02hGA"&gt;Chef Wang, an
extremely tasteful Sichuan chef, trying a fine dining vegan
restaurant&lt;/a&gt; is &lt;em&gt;fascinating&lt;/em&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Having dissected the phenomenon of &lt;em&gt;taste&lt;/em&gt; along many
dimensions, I think it’s appropriate to ask: what remains? The answer is
left as an exercise to the tasteful reader. (&lt;a
href="/essays/9_16_2025/#taste"&gt;You can find my answers here&lt;/a&gt;.)&lt;/p&gt;
</summary></entry></feed>