Startup update 2: LLM musings

Originally posted 2025-05-05

Tagged: cartesian_tutor, llms

Obligatory disclaimer: all opinions are mine and not of my employer


Progress update

This week, I managed to get a SvelteKit UI up and running, but a lot more time was spent understanding the background intricacies of the javascript ecosystem - how the toolchain (typescript -> javascript -> bundler) works, as well as the differences between Svelte, SvelteKit, and Vercel. I am currently figuring out what the right devloop looks like where I add a new API on the backend, a frontend client gets generated automatically, and then I can add that feature on the frontend.

(Is this the SaaS equivalent of a game developer spending all their time building a game engine instead of building a game?)

Anyway, next week’s goal looks the same: get a basic UI up and running against the LLM-streaming backend endpoints I’ve already set up.

“LLM-first” companies?

I’m also starting to think more about what it means to have an LLM-first company. I can imagine several degrees of LLM-first, corresponding roughly to the tiers outlined at https://www.moderndescartes.com/essays/gnugo_to_agz/#how-to-tackle-difficult-problems .

  1. A human uses Perplexity, Deep Research, or other LLM tooling to interactively plan, devise, and execute a task.
  2. A human kicks off a goal-oriented LLM, which uses an LLM to autonomously plan and execute a task, optionally with human checkpoints / verification steps built in.
  3. A human kicks off a goal-oriented LLM, which uses an LLM to autonomously plan and execute on a plan; a secondary LLM acts as an automated critic/evaluator of the first LLM’s actions, allowing the whole thing to run autonomously. The whole thing is wired up to a paperclip maximizer loop to maximize $$$. Possibly with human analysis to attribute success to the right places and to catch/neuter anomalous behavior from the LLMs.

We’re very much at stage 1 for most disciplines, with an attempt at stage 2 for coding happening with Claude, Cursor, Windsurf, etc.. But I think it ultimately boils down to an information theory problem: if it takes N bits to specify a task to an LLM, then you cannot possibly have more than, say, 2^N distinct behaviors from the LLM. It’s the same reason why LLMs are great at producing Flappy Bird clones but require a ton of additional prompting to customize further. How can we resolve this information bandwidth paradox?

I’m reminded of how when a company hires a new employee, it always takes ~6 months for the employee to come to full productivity. A lot of this timeframe is social: the company has a certain working rhythm, established communication patterns, decision-making patterns (whether explicit or implicit), ways of resolving alignment issues between subteams, and so on, and it just takes time to learn who to talk to and how things work around here. Often, a lot of this “how stuff works” is documented post-hoc in the form of decision docs, in process documentation, and in writing/all-hands communication from senior leaders.

A fresh hire gets a very different set of instructions compared to a seasoned employee - the latter just “gets it” and doesn’t need as much handholding or task specification. How can we get LLMs from “fresh hire, every fricking time” to “it just gets it”? LLMs need context, too - something like a Company Mission statement, a nuts and bolts review of what the company’s product is, who it sells to, who its competition is, strategic positioning within the ecosystem, what image the company would like to cultivate, working style, communication patterns, etc.. Usually this is implicit in a people-centric company, but in an LLM-centric company, it has to be written down and fed in as context/instructions to the LLM. You would probably need additional fine-tuning of working modes for each job function.

At the startup scale, I think a simple implementation of stage 2 for miscellaneous tasks need not be too complicated - a “corporate” directory with all of the relevant context and docs in plain markdown format, and an LLM agent capable of picking relevant context to read.