Startup update 5: Officially incorporated!

Originally posted 2025-05-24

Obligatory disclaimer: all opinions are mine and not of my employer

Progress update

I am officially incorporated as Cartesian Tutor, Inc! It was a true RPG quest getting this working: various production components need a credit card, which needs a bank account, which needs an EIN, which needs the company to be registered, etc. etc.. Props to Stripe Atlas for making this all so easy and outlining the steps and dependencies.

The other annoying thing I tackled this week was authentication - I now have a Sign in with google/facebook/apple modal. That took 2-3 days to wire up backend + frontend + Auth0’s configurations all correctly. AI was pretty useless in this regard; e.g. I wasted half a day debugging why my frontend wasn’t sending the auth tokens to the backend, only to realize after very carefully reading the docstrings on the generated client that it only sends auth tokens if the backend advertises itself as needing auth (and I hadn’t built the backend half yet!)

The time I spent building out and testing my database migration strategy in week 1 is paying off, as every backend change has been nearly trivial so far. I’ve cleared out nearly all of the shitty vibecoded frontend code from week 2/3 by now - almost all of it had to be redone because it was full of race conditions and crappy callback/event firing loops, and they were now all firing before auth tokens were ready, and weren’t refreshing or clearing themselves out when a user logged in/out, etc..

Vibecoding assistants are really, really harebrained when it comes to frontend - when you say something like “The listConversations call should only fire after auth is ready”, then it interprets this command as “Add another nested level of callback hell that chains listConversations after auth”. Frontend frameworks like Svelte and React are supposed to provide a framework for managing all of this state complexity and callback hell - and yet the AI never uses the framework properly; it just resorts to adding another spaghetti noodle to the dish. I strongly suspect these vibecoded apps will start collapsing under their own complexity at roughly the stage I’m at: 3-5 major intersecting features.

My work time these days is spent as roughly 50% reading the docs + learning the material well enough to understand the vibecode output, 10% time vibecoding, 20% fixing up the code, and 20% testing the app. As far as coding goes (10% vibecode + 20% manual fixup), I agree with the 2-3x speedup numbers, but the non-sped up parts matter more now. Classic Amdahl’s law result.

LLMs are still useful, though! I asked it to exclude generated files from eslint, and it whipped up a negative lookahead regex to exclude that directory. I was half-expecting that my linter should, y’know, have a configuration flag to exclude directories, but this works too.

I talked to a lot of people this week, including a few prospective customers/friends and realized that it’s not enough to have a Socratic concept tutor; you also need problems for the kids to work on. I also have narrowed in on math as a focus subject for the initial launch. I’ve talked with my high school’s math team and hope to give a demo + pitch the students on using the tool at that two week mark - so that sets a pretty hard deadline for having a working demo :D

LLM is still just a fancy markov chain

So then I played around with a Socratic problem solving tutor and threw a moderately difficult problem at it (say, mid-late AIME). This was a problem that I did not know how to solve, nor was it apparent to me at first glance how I might even begin to solve it.

At first, the AI seemed to be helpful, providing hints and encouragement. And then, several algebraic messes later, I soon realized that it had no fucking clue what it was talking about. It happened to be very good at pattern matching and suggesting next steps - e.g. when it saw the expression \(-(a^3 + b^3 + c^3) + ab^2 + ac^2 + bc^2 + ba^2 + cb^2 + ca^2\), it said,

“Good! And can you recall a useful identity about sums of cubes? Specifically, if we look at \((a+b+c)(a^2+b^2+c^2−ab−bc−ca)\), what does this expand to? This might help simplify your expression.”

And yes, very good - I’d forgotten about that identity and was like, great! let’s wade into this mess and solve the problem. Only to realize that it actually was a total dead end. So the AI knows how to predict the next step of the proof; it just has no global sense of whether it’s going anywhere useful. In that regard, it feels like a Markov chain operating on higher-level “units” of proof. Which is a significant step forward, and might even be RL’able into a decent math model. (I used the recently released Claude 4 Sonnet, which in theory should have all this RL’ing built in.)

So I think ultimately, what I need is a solution key that the tutor can refer to as a roadmap. And this solution key would also have to have multiple possible solutions, since I don’t want it telling the student they are wrong when the student was merely discovering a new solution. I think that means I should be scraping the AoPS AMC/AIME problems wiki to start - e.g. this sort of thing.

That should be good enough for an MVP; synthetic content can follow later.

The other thing I tried was the non-socratic thing (ChatGPT mode), where I just ask it to solve the problem from scratch. It then went down the same incorrect path that I’d initially gone down, and then randomly pulled some new identities out of its ass, ignored all the work that had been done up to that point, and went on to declare that it was proved. I think what happened was that the LLM saw “Given X, prove Y”, and it knew that the structure of the proof was that you start with X and end up with Y near the end. So it started from X generated a complicated-ish looking algebraic proof, and when it felt that it had gone on long enough, it decided it was time to finish generating the proof, and looked at Y, and went one step backwards, invented some new lemmas that were 1 step immediately away from solving Y, and then did the 1 step and declared the proof done. The illusion is strong but at its core, it really is just a token prediction machine, rather than a thinking machine.

The irony of the name “Cartesian Tutor” is not lost on me.