Startup update 3: Gell-mann AmnesiAI

Originally posted 2025-05-12

Tagged: cartesian_tutor, llms

Obligatory disclaimer: all opinions are mine and not of my employer


Progress update

This week, I have learned more than I really wanted to learn about Svelte’s reactivity rules and when updates get triggered. As a backend engineer hoping that vibe coding would just obviate the need to know frontend, this is just not true - for maximum effectiveness, you basically have to operate at the L5/ Senior Engineer level to effectively use these things. Coding LLMs are like a junior engineer on cocaine, just a little bit frothy at the mouth and occasionally genius, but absolutely requiring supervision.

At this point, I’ve got a chat UI that is correctly configured with my backend, correctly updates its state upon user input / streaming backend LLM response (getting autoscroll to follow the bottom of the conversation was for some reason, really hard for the LLM), and what I think is a reasonably factored svelte codebase. My devloop is also ironed out, with autoreloading scripts for both backend and frontend, and an automatic client codegen tool for my backend API. Hopefully it’s all smooth sailing from here.

Future work and time estimates:

  • I spent 1 week to set up my backend
  • I spent 1 day vibecoding my frontend, and then 2 weeks learning frontend and debugging the vibecoding output.
  • 1 more week fleshing out features on my backend/frontend
  • 1 more week for the paperwork/overhead of setting up a company, Stripe billing, OAuth login, domain name registration, sign up for startup credit freebies, etc. Set up the production instance of my app.
  • 1 week to build out the account creation+account link flow and permissions model and testing the basic user journeys. (roughly: I intend to have parents/kids have separate but linked accounts, and the parents should be able to chat with my agent to discuss their kids’ progress, but not be able to view raw chat transcripts. the kid should be able to view all past chat transcripts.)
  • 1 week to do content scripting - onboarding, lesson plan generation, problem+solution key generation, concept deep dives, quizlets.
  • 1 week for launch miscellany - setting up the non-app part of my webpage (home page, etc.)

Current launch timeline is now mid-June.

AI Coding notes

My LLM philosophy at this point is:

  • Vibecode to set up a template of your code - LLMs can spit out multi-file, multi hundred lines of code that basically work, and this is akin to cloning a starter codebase.
  • Get really nitpicky about code architecture; set up the LLM for long-term success by refactoring the code using all the appropriate best practices. You can use LLMs to refactor the code; they just need to be told how/what to refactor.
  • When refactoring, LLMs often spit out vestigial code. In order to prevent future confusion (both for you and LLM), tell them to repeatedly simplify the code. Eventually, it will get so simple that features actually stop working. Then revert the latest change and you’ve actually cut down to the right level.
  • Let the LLM pattern-match on your existing code to continue generating new features. (This part is honestly not that different from how engineers work in large company codebases.)
  • Continue being anal about code architecture.

More LLM thoughts

Will LLMs improve over time, to the point where they don’t need supervision? In a recent discussion with Claude Code engineers they talk about having LLMs automatically enforce coding norms by listing them out. I’ve been doing something similar by using an LLM style editor to edit-check my essays. I am skeptical that using LLM coding style checkers will elevate LLMs from junior to mid-level engineer; while my edit-checker often highlights issues in style and provides useful suggestions, it equally often makes useless suggestions.

It makes me wonder how good LLMs will actually be in non-coding settings. Is this just Gell-Mann amnesiAI, for me to be skeptical of vibe-coding’s effectiveness in front-end coding, but then be optimistic that LLMs will eliminate my need to hire/find a cofounder who has stronger marketing/sales/managerial strengths?

Coding might be special: the verbal blather of LLMs doesn’t stand up to the precise thinking that a compiler or asynchronous coding requires, and there is something powerful about letting the LLM duke it out in a loop with compiler/linter errors until they are both satisfied. And yet, Anthropic absolutely doesn’t trust Claude to manage the long-term health of its own codebase.

Swyx [00:31:36]: I think the… So at this point, I just, you know, I want to… This tagline is in my head that basically at Anthropic, there’s Claude Code generating code. And then Claude Code also reviewing its own code. Like, at some point, right? Like, different people are setting all this up. You don’t really govern that. But it’s happening.

Boris [00:31:53]: Yeah, we have to be, you know, at Anthropic, there’s still a human in the loop for reviewing.

Swyx [00:32:13]: We have, you know, VPs of N, CTOs listening. Like, this is all well and good for the individual developer. But the people who are responsible for the tech, the entire code base, the engineering decisions, all this is going on. My developers, like, I manage, like, 100 developers. Any of them could be doing any of this at this point. What do I do to manage this? How does my code review process change? How does my change management change? I don’t know.

Cat [00:32:48]: We’ve talked to a lot of VPs and CTOs. Yeah. They’re really excited about it. They actually tend to be quite excited because they experiment with the tool. They download it. They ask it a few questions. And, like, Claude Code, when it gives them sensible answers, they’re really excited because they’re like, oh, I can understand this nuance in the code base. And sometimes they even ship small features with Claude Code. And I think through that process of, like, interacting with the tool, they build a lot of trust in it. And a lot of folks actually come to us and they ask us, like, how can I roll it out more broadly? And then we’ll often, like, have sessions with, like, VPs of Dev Prod and talk about these concerns around how do we make sure people are writing high-quality code. I think in general, it’s still very much up to the individual developer to hold themselves up to a very high standard for the quality of code that they merge. Even if we use Claude Code to write a lot of our code, it’s still up to the individual who merges it to be responsible for, like, this being well-maintained, well-documented code that has, like, reasonable abstractions. And so I think that’s something that will continue to happen where Claude Code isn’t its own engineer that’s, like, committing code by itself. It’s still very much up to the ICs to be responsible for the code that’s produced. Yeah.

But at the same time, most other fields, excepting maybe law, don’t seem to need the level of precise thinking that coding does. Obviously there is something to those “imprecise” jobs - I don’t think for a second that I could survive a week as a marketer or salesperson. But perhaps those fields have a lower skill ceiling, or perhaps they are nontechnical in a way that LLMs will be able to capture the nuance more readily. But I think the alternative hypothesis is that humans are really Generally Intelligent in a way that allows us to capture the fractal complexity of reality, and that our understanding and capabilities in any domain are proportional to log(cumulative people-hours spent trying to get better at X).