Modern Descartes - Essays by Brian Leehttps://www.moderndescartes.com/essays2024-02-01T00:00:00ZI seek, therefore I amFinding New Mountains to Climb2024-02-01T00:00:00Z2024-02-01T00:00:00Ztag:www.moderndescartes.com,2024-02-01:/essays/new_mountains
<p> Originally posted 2024-02-01</p>
<p> Tagged: <a href="/essays/tags/personal">personal</a></p>
<hr />
<p>A long overdue personal update!</p>
<h2 id="machine-learnings-manifest-destiny">Machine Learning’s Manifest
Destiny</h2>
<p>Manifest Destiny, for my non-American and otherwise history-unaware
readers, comes from the era of American history where settlers rushed
westward in pursuit of the “free” land that was being doled out by the
U.S. government. The Destiny was that America would eventually grow from
coast to coast to create a glorious future. The omitted fine print was
that you might have to evict the native inhabitants of that land. It’s a
historical era with strong similarities to recent times.</p>
<p>I’m of course, talking about machine learning circa 2015-2020. It was
obvious to everyone in ML that there was so much low hanging fruit
across every subject, and ML was destined to pick that fruit and move
the state of the art forward to a glorious future. There were
researchers scattering in every direction, hoping to stake claims in
“unoccupied” territory and be the first to apply ML to a problem. Like
the original Manifest Destiny, it ignored existing subject matter
inhabitants and betrayed the arrogance of thousands of ML researchers.
But with breakthroughs like AlphaGo, AlphaFold, incredible computer
vision accuracy and voice synthesis, some arrogance was warranted!</p>
<p><a href="https://www.moderndescartes.com/essays/my_ml_path/">At the
start of 2017</a>, I had just wrapped up a nine month sabbatical where I
taught myself machine learning, and started working as an ML engineer at
Verily. I had a great time with my Noogler project, which was to
investigate the potential utility and data readiness of an electronic
medical record dataset. The answer was a pretty resounding “not ready”,
as documented in “<a
href="https://www.moderndescartes.com/essays/deep_learning_emr/">Deep
learning on EMRs is doomed to fail</a>”. Still, this work was exactly
what I’d hoped to do in the big leagues, and I was excited for my next
project.</p>
<p>Unfortunately, the next project didn’t live up to my expectations. We
built some data pipelines that were already <a
href="https://www.moderndescartes.com/essays/sql_join/">tech-debt
laden</a> on delivery, and then it turned out that what we’d built was a
complete mismatch to what the business actually needed, which was two
business analysts and a SWE. Instead, our team of 10 SWEs/ML engineers
built a janky pipeline that needed 4 SWEs just to keep afloat.</p>
<p>Frustrated with this project, I started putting my energy into
Minigo, an ambitious attempt to replicate AlphaGo Zero. (Spoiler alert:
<a href="https://openreview.net/forum?id=H1eerhIpLV">we succeeded</a>)
It was technically a 20% project, but was effectively a second 80%
project. It was fun work, though, which is why I didn’t mind. On the
strength of that work, I was able to transfer into Google Brain in 2018
to work on <a
href="https://www.tensorflow.org/guide/function#autograph_transformations">TensorFlow
2’s AutoGraph</a> feature.</p>
<p>AutoGraph is a Python-to-Python compiler that transformed imperative
Python control flow into the equivalent functional tf.cond and
tf.while_loop calls, making it easier to express complex control flow -
things like <a
href="https://deepmind.google/discover/blog/wavenet-a-generative-model-for-raw-audio/">WaveNet</a>,
beam search, and meta optimizers. It got me acquainted with compilers
(which was a big gap in my CS knowledge), and taught me how to unleash
Google’s systems to automatically run millions of lines of internal
TensorFlow code as a test input to our compilers. It was the perfect
demonstration of how Google allowed research engineers to work at the
intersection of powerful engineering infrastructure and cutting-edge ML
research.</p>
<p>After TF2 launched, I transitioned to working on graph neural
networks for organic chemistry and more specifically, olfaction. As an
ex-chemist, I’d never imagined that the field I’d abandoned in 2011
would be relevant again. Our team got some <a
href="https://arxiv.org/abs/1910.10685">strong early results</a>, and
started a variety of collaborations to find experimental confirmation of
our results.</p>
<h2 id="complacency-shattered.">Complacency, shattered.</h2>
<p>2021 was an inflection point for me. I’d been promoted to tech lead
of my team, and for the first time since dropping out of grad school, I
finally stopped feeling like I was “behind” some of my peers who had
taken the straight path through a CS degree and into
Google/Facebook/Dropbox right out of undergrad. (I also happened to know
Greg Brockman from <a
href="https://www.acs.org/education/students/highschool/olympiad/about/participants/past-teams.html#:~:text=High%20School%2C%20NJ-,Gregory%20Brockman,-%2C%20(silver)%2C%20Red%20River">high
school summer camp</a>, which really didn’t help my imposter syndrome…)
I was now the one fielding questions from my friends in academia about
switching to industry, instead of the other way around. I had their
dream job - academic freedom to work on the forefront of science, with a
big tech salary.</p>
<p>I got complacent. I thought the environment at Google Brain was
intellectually stimulating enough to continue my personal growth, but
motivation, for me, comes from within. I had a frank conversation with
my manager and skip manager: I told them that the project I’d been
assigned was somebody else’s L5 promo project, but it wasn’t an L6
project. I didn’t feel like I was being stretched or being given room to
grow. They gave me some vague statements about how what I was doing was
the most impactful thing I could be doing for Google right now, but they
didn’t tell me I was wrong.</p>
<p>After that conversation, I felt my motivation drop; an easy project
that should have taken 6 months to finish dragged on to take 18 months
instead. In retrospect, while I’d diagnosed the problem correctly, my
demands were laughably naïve. L6 projects aren’t given to you; you
create them out of thin air. Your ability to envision and materialize a
new future is specifically what it means to be L6.</p>
<p>Unfortunately for me, envisioning a new future would be equivalent to
declaring mutiny on my manager, who was very protective of his
intellectual ownership of our team’s mission. I should have realized
this when I wrote a precursor to <a
href="https://www.moderndescartes.com/essays/rgb_odor/#in-search-of-the-rgb-of-odor">this
manifesto on the science of smell</a> in 2019 and received a deflection
rather than enthusiastic support from my manager. There was simply no
room on my team to grow, and it would have been better for myself and
for my team to have somebody who was trying harder.</p>
<p>Eventually, external events forced my hand. In early 2022, out of the
blue, our team was pulled into a conference room and were told,
“Congratulations! You guys are now a startup!”. Our manager, unhappy
with the size of our team, had petitioned Google leadership to grow
headcount by an order of magnitude, and the resolution was that Google
would not fund our expansion. Undeterred, he found external VC funding
and negotiated a spinout of our project as a new startup, and we were
finally informed as the ink dried on that deal.</p>
<p>From our perspective as Google employees, our project had effectively
been canceled, and our choices were to either find a new project, or
quit Google and follow our now ex-boss to Osmo. Our boss was now
offering us our old jobs back at “market rates for startup developers”.
“You get to put ‘founding engineer’ on your resumé - it’s just as good
as being an actual cofounder!”, he told us. I could tell I would not
succeed in negotiating a serious offer from him, and decided to look for
a new project at Google.</p>
<p>I spent the rest of 2022 asking pointed questions of myself and of
the L7+ PMs, VPs, and execs who had helped orchestrate the spinout. I
learned a great deal about how Google leadership made decisions; how
incentive structures aligned at each level of management, and how, if
you had the right idea to generate greater shareholder profit, you held
the moral high ground in escalating as far as you needed in order to
realize your vision. During this time I searched for new projects and
helped evaluate chemistry/ML project proposals in Climate. I tried
(unsuccessfully) to pitch my own project on refrigerant mixture design,
after seeing an unfilled niche. I would have kept on pitching, but
fatherhood called and I went on paternity leave.</p>
<p>During those sleepless nights with our newborn baby, I started
writing <a href="https://www.moderndescartes.com/essays/why_brain/">“Why
does Brain Exist”</a> and pondered my future. In the end, I never
resolved my project matching situation, as I got laid off shortly before
my paternity leave ended.</p>
<h2 id="post-layoff-blues">Post-layoff blues</h2>
<p>It took me a few months after the layoff to regain my footing. I was
still underwater from the first-time parent transition, and I was
uncertain about what direction I wanted to go next. As I talked with
many friends, I realized that a year after my boss’s betrayal, I was
still not quite sure what happened, why, or how I could protect myself
in the future. I refused to consider another big tech job, and I had an
irrational fear of anything LLM-related, given that it would bring me
closer to working with MBA-types.</p>
<p>I tried to understand things from my ex-boss’s perspective, talking
with other founders about how they viewed their ownership and relative
contributions from team members. I also read MBA/consultant literature.
I learned that on the stakeholder inform-consult-negotiate
classification scheme, I hadn’t even qualified for the “inform” bucket.
My conclusion was ultimately that Google execs and my management chain
did it because they could and because there was greater shareholder
value to be created. While I take some solace in knowing that my ex-boss
lost a year of momentum by having to rehire and re-gel his team, I
acknowledge that he probably found a replacement for me who’s willing to
work at market rates for startup developers and be happy with having
“founding engineer” on their resumé.</p>
<p>I resolved to at least learn enough about business to fall into the
“consult” bucket in the future</p>
<p>I set up my LinkedIn for the first time, and learned that the vast
majority of my friends and friend-of-friend consultants were current or
former BCG. In the same way that I benefited from <a
href="https://www.moderndescartes.com/essays/readability/">Google’s deep
and rich internal culture</a> around best engineering practices, I
anticipated I might benefit from BCG’s similarly deep and rich internal
culture around best consulting practices. I practiced my case studies,
did some part-time consulting to get experience, and practiced my
presentation skills by rewriting my Brain essay in <a
href="https://en.wikipedia.org/wiki/MECE_principle">MECE format</a>. I
networked my way into meetings with sympathetic BCG partners who
referred me to their hiring directors. Unfortunately, I was stymied by
BCG’s hiring freeze. McKinsey and Bain were also in a similar
situation.</p>
<p>Around this time, an old friend reached out and convinced me to join
Motional as a systems engineer, working to collate, analyze, and present
information to executives on progress in capabilities and safety. I
figured that this would both tickle my technical itch as well as get me
the exec-facing experience I was looking for. I did enjoy getting up to
speed on the self-driving technology stack, but I was overwhelmed by the
crazy levels of bureaucracy I saw at Motional. I felt like there was no
way I could make a difference in the company’s success. I think that if
I’d stayed, I would have learned a great deal about negotiating with
unaligned parties, but I would also have gone mad! The “efficient
bureaucracy” theme running through some of my recent essays was my way
of working through the frustration I felt at Motional. So I quit.</p>
<p>Most recently, I’ve been having a great deal of fun at <a
href="https://lilacml.com/">Lilac</a>, a startup focused on text data
analysis and curation using LLMs. I have fully recovered from my malaise
- I’m quite actively involved in helping shape narrative/product/company
direction and don’t feel the ugh wall that previously stopped me from
getting near VCs.</p>
<h2 id="takeaways">Takeaways</h2>
<p>For me, my biggest personal mistake was that I didn’t find a new
mountain to climb once I had summitted the ML mountain. I think that
ultimately, the reason I stayed was because I was pinned by others’
expectations of my job: why would anyone leave what was essentially a
dream job? For example, I’d considered and decided against going back to
grad school, realizing that I must be the only insane person who thought
that particular side was greener.</p>
<p>I felt far more alive in the four month period following the spinout
when I could focus on trying to pitch new projects in
Climate/ML/Chemistry.</p>
<p>I don’t think it was necessarily a mistake that I got screwed by my
ex-boss; that happens to everyone eventually. If it happens to you, my
advice is: don’t take it personally. Just remember what it feels like.
Don’t forget it. Learn what you can, and move on. And don’t try to
preempt it - once you start leaning into politics, it’s far too easy to
rely on it as a crutch to progress career-wise, instead of progressing
by honing your technical abilities.</p>
Bureaucratic Leverage2023-12-12T00:00:00Z2023-12-12T00:00:00Ztag:www.moderndescartes.com,2023-12-12:/essays/bureaucratic_leverage
<p> Originally posted 2023-12-12</p>
<p> Tagged: <a href="/essays/tags/system_dynamics">system_dynamics</a>, <a href="/essays/tags/management">management</a>, <a href="/essays/tags/popular">popular</a></p>
<hr />
<p>Why do we hate bureaucracy?</p>
<p>Taken literally, a bureaucracy is just an organization tasked with
ensuring some outcome. In the public sector, OSHA ensures worker safety,
FDA ensures drug safety, EPA ensures environmental protection; in the
private sector, HR ensures legal compliance, IT ensures trade secrets
and data privacy, and so on. Yet even if people agree with the outcome,
they often disagree with the implementation. Bureaucracies have an
endless talent for finding wasteful and ineffective solutions.</p>
<p>Bureaucracies are ineffective due to a lack of accountability. If a
bureaucrat imposes a wasteful policy, what are the consequences? Well,
as long as they are achieving their desired outcome, they are doing
their job, regardless of the pain they inflict on others. They can wield
legal, technical, or financial penalties to force compliance. And
paradoxically, when bureaucrats fail to achieve their desired outcome,
they often get a bigger budget or a bigger stick to wield, rather than
being fired for incompetence. The inability to recognize failure goes
hand in hand with the inability to recognize success: competent and
ambitious people avoid working for bureaucracies because their efforts
go unrewarded. Bureaucracies end up staffed with middling managers, and
we have learned to hate them.</p>
<p>I don’t know how to solve this problem in the public sector, but I
think it’s solvable in the private sector, because there is
theoretically a CEO who is incentivized to maximize the overall
effectiveness of the company; they just need the right tactics. The
solution is simple: <strong>hold bureaucracy accountable by forcing them
to do the actual work</strong>. Let me explain.</p>
<h2 id="bureaucratic-leverage">Bureaucratic leverage</h2>
<p>Bureaucracies usually don’t do any work. This is true in two layered
senses:</p>
<ul>
<li>they don’t accomplish primary objectives; they are in the business
of ensuring secondary objectives.</li>
<li>they don’t do the work of accomplishing the secondary objectives
either; the work is usually pushed onto the same people accomplishing
the primary objectives.</li>
</ul>
<p>To give a concrete example: the FDA doesn’t research, develop, or
manufacture the drugs; pharmaceutical companies do. The FDA merely
ensures that the drugs are safe and effective. And in ensuring so, the
FDA doesn’t run the clinical trials; instead, the pharmaceutical
companies are responsible for running the trials at cost, and submitting
the paperwork to the FDA.</p>
<p><strong>Bureaucratic leverage</strong> is defined as the ratio of
work produced for <em>external entities</em> to do, relative to the
amount of work <em>directly done</em> by the bureaucracy. In this
example, the FDA’s 2023 human drugs budget was $2.3 billion dollars <a
href="https://www.fda.gov/media/166182/download?attachment">(reference)</a>,
while the U.S. clinical trials market was $25 billion dollars <a
href="https://www.biospace.com/article/releases/u-s-clinical-trials-industry-is-rising-rapidly-usd-35-1-bn-by-2030/">(reference)</a>.
To a first approximation, the FDA’s human drugs subdivision therefore
has a bureaucratic leverage ratio of 11x.</p>
<p>To give another example, GitHub in 2020 <a
href="https://github.com/github/renaming">changed the default git branch
name from <code>master</code> to <code>main</code></a>, a change
intended to promote greater inclusivity of historically and currently
enslaved peoples. I would estimate that roughly 3 person-months of
GitHub’s effort went into considering the impact and implementation
details of this change - a very generous and thoughtful investment into
inclusivity. Yet, the changes imposes a global cost that I would roughly
estimate at ~1 million affected developers * 15 minutes per developer =
~300 person-months of effort, for an approximate bureaucratic leverage
ratio of 100x.</p>
<p>A high bureaucratic leverage ratio is not intrinsically a bad thing.
However, scope insensitivity is a real problem: when a bureaucrat wields
100x leverage, it is a heavy responsibility that is easily
underestimated. There are situations I’ve seen at Google where every
hour of downtime costs the company millions of dollars - and a crack
team of site reliability engineers whose combined hourly wages are tens
of thousands of dollars are desperately working to get it back up. That
is the level of urgency that a 100x leverage ratio <em>should</em>
demand. Does the typical bureaucrat with a 100x leverage ratio behave
with that level of urgency? Absolutely not.</p>
<h2 id="creating-bureaucratic-accountability">Creating bureaucratic
accountability</h2>
<p>“Force bureaucracies do the work” now takes on a more precise
definition: we should hold bureaucracies to a 1x bureaucratic leverage
ratio.</p>
<p>The rationale is simple: it is globally efficient for a bureaucracy
to spend 1 unit of time, if it will reduce everyone else’s workload by
more than 1 unit of time. At this breakeven point, the bureaucracy will
have done roughly 50% of the total work. This rule is not meant to be
taken too literally, since these quantities are difficult to measure
precisely.</p>
<p>From the bureaucrat’s point of view, this means that they have two
budgets to manage: their internal budget, and the external budget for
asking other organizations to do something. Bureaucrats will be
incentivized to reexamine and optimize their external demands. From
everyone else’s perspective, they can be assured that what they’re asked
to do has been priority-sorted - or if it hasn’t been, they can at least
be assured that there’s a limited amount of it they’ll be asked to
do.</p>
<p>The truth is, this external budget has always existed - implicitly -
in the form of compliance. Consider a badly run IT/security department.
They run third-party security scanners on the company’s servers, and
file hundreds of low-value automated tickets with other teams to fix.
They require frequent password changes and relogins. They ban
installation of all non-approved apps and drag their heels on approving
new apps. A few days of “work” can easily generate years of lost
productivity for product teams, if their demands are taken at face
value.</p>
<p>In practice, people have limited tolerance for bullshit; if you flood
their bug tracker with automated security reports, they’ll just bookmark
a custom search page that filters out security reports. If you require
frequent password changes, they’ll use a formulaic password or keep a
password post-it on their monitor. If you drag your heels on approving
apps, they’ll upload the company data to a webapp or run it off a USB
stick. The oft-cited “bullshit umbrella” role of managers is essentially
a rate-limiter on bureaucracy.</p>
<p>On the positive side, centralization of work creates economies of
scale - a topic <a
href="/essays/codemates/#you-get-a-papercut-you-get-a-papercut-everybody-gets-a-papercut">I’ve
previously discussed in the context of code quality</a>. A bureaucrat
forced to grapple with personally doing a lot of repetitive paperwork
will very quickly decide that some paperwork was never necessary and
will invest in solutions to autofill fields where possible.</p>
<h2 id="embedded-bureaucrats">Embedded bureaucrats</h2>
<p>In a past life as a bureaucrat, my manager asked me to spend my first
three months doing a rotation with a partner team for three months. On
paper it looked like he was just donating his headcount to other teams,
but I was amazed at how many secondary benefits came out of this
rotation program.</p>
<ul>
<li>It made us insiders: by working alongside the partner team, we
became friends and our requests were readily accepted by the partner
team.</li>
<li>We empathized with our partners: since we knew what burden our
requests would create, we could try to avoid wasted effort and respect
our partners’ time.</li>
<li>It made us credible: our partners could see that we were competent
and that they should believe us if we said something was necessary.</li>
<li>It was an advance payment: by giving free resources to the partner
team, we could later ask for at least that much without questions
asked.</li>
</ul>
<p>This rotation arrangement seems like a no-brainer to me, at least
within the company context. Something I don’t understand is why this is
frowned upon as a “revolving door” of corruption in the public sector,
when it is so plainly beneficial in the private sector.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Creating bureaucratic accountability begins with eliminating excuses,
which eventually lets us measure performance, punish poor performance,
and reward excellence. The usual excuse is noncompliance - “we made bad
demands and people ignored/worked around us. If only we had a bigger
stick to enforce compliance, we would have been able to accomplish our
goals.” A bureaucracy with a limited external budget means that people
who find ways to do more with less will be rewarded - the right
alignment of incentives.</p>
How to be a Good Codemate2023-10-11T00:00:00Z2023-10-11T00:00:00Ztag:www.moderndescartes.com,2023-10-11:/essays/codemates
<p> Originally posted 2023-10-11</p>
<p> Tagged: <a href="/essays/tags/software_engineering">software_engineering</a></p>
<hr />
<p>Every year, millions of college first years become roommates with
people they’ve never met before. There are no parents to set rules, and
that one roommate who keeps on leaving their stuff everywhere, doesn’t
seem to understand that he’s making everybody’s life harder. Sometimes,
one person does the thankless work of cleaning up after their roommates.
Arguments erupt over allocation of duties, and the takeaway lesson is
inevitably, “get your own place as soon as you can afford it”.</p>
<p>A similar situation plays out in tech companies as recent graduates
become codemates with engineers they’ve never met before. Normally there
are “parents” (senior engineers) to supervise, but this isn’t always the
case. Maybe the company hired too many junior engineers; maybe the
senior engineers are overloaded; maybe the senior engineers can’t or
won’t mentor. For junior engineers in these situations, here is a crash
course on <strong>How To Be a Good Codemate</strong>.</p>
<h2 id="why-you-should-care">Why you should care</h2>
<p>You should care if any of these happen regularly:</p>
<ul>
<li>The commands you were running yesterday don’t run today, and you
find yourself double-checking the main branch to see if that’s broken,
too.</li>
<li>You spend half your work day trying to diagnose a mysterious bug,
only to find out that the fix involves updating a dependency or
executing some other manual environment-altering step that you would
have never figured out on your own.</li>
<li>Your effort is spent in equal parts writing the actual code, and
packaging the code in a way that satisfies all of the automated linters,
tests, links to ticketing systems, and other blocking requirements.</li>
<li>You encounter significant merging/rebasing costs when trying to
merge your work into a quickly-moving main branch.</li>
<li>Your continuous integration pipeline routinely flakes, and the
recommended workaround is simply “try running it again”. People insist
on carefully reviewing every change, because they don’t trust that CI
will catch bugs.</li>
</ul>
<h2 id="how-to-be-a-good-codemate">How to be a good codemate</h2>
<p>The common theme here is that <em>none of the above problems have to
do with coding ability or code quality</em>. It has, instead, everything
to do with setting expectations on how you communicate and collaborate
within a shared codebase. You may additionally run into frustrations
directly related to bad code, but today I’d like to focus on non-coding
frustrations.</p>
<h3 id="tell-your-teammates-what-youre-working-on">Tell your teammates
what you’re working on</h3>
<blockquote>
<p>The commands you were running yesterday don’t run today, and you find
yourself double-checking the main branch to see if that’s broken,
too.</p>
</blockquote>
<p>When one person’s code change breaks another person’s workflow, it
isn’t necessarily the fault of the person who submitted the change. If
they didn’t know how to test your workflow, then they can’t possibly be
expected <em>not</em> to break you. So you have to tell them how to test
your workflow - ideally as a unit test, integration test, or in its
simplest form, a bash script or other command line invocation.</p>
<p>Assuming you have a continuous integration system configured, you can
even hook this script into CI as an integration test. (Only do this if
your script runs in less than, say, 30 seconds, perhaps by taking
advantage of a flag like “–data_fraction=0.001”. Long CI runtimes are an
expensive tax on development - avoid if at all possible.) Adding your
scripts to CI comes with two main benefits:</p>
<ul>
<li>your codemates can’t accidentally break you - they will be stopped
by CI!</li>
<li>in fact, your codemates will fix your script for you, by updating
your code, flags or whatever else the fix may involve. This is globally
optimal, as the person making the breaking change usually knows better
how to fix the breakage. (If this is not true, then it’s only fair for
them to ask you for help in fixing your script.)</li>
</ul>
<p>As the number of engineers collaborating in a codebase goes up, the
frequency of inadvertent breakage events goes up quadratically. The <a
href="https://abseil.io/resources/swe-book/html/ch11.html#the_beyonceacutesemicolon_rule">Beyonce
Rule</a>, as it is called at Google, is simply, “If you liked it, you
shoulda put a test on it.”, and is the only way to scalably inform your
team how not to break your code.</p>
<h3 id="some-user-assembly-required">Some User Assembly Required</h3>
<blockquote>
<p>You spend half your work day trying to diagnose a mysterious bug,
only to find out that the fix involves updating a dependency or
executing some other manual environment-altering step that you would
have never figured out on your own.</p>
</blockquote>
<p>Occasionally, a commit will require manual action for continued
correctness. For example, an updated dependency might require everyone
to run <code>pip install --upgrade some_library==newer.version</code>.
Or perhaps some AWS account permissions or buckets got changed and
everyone needs to update their .aws config file.</p>
<p>Changes requiring manual steps <strong>need to be announced
publicly</strong>. There is nothing sillier than having multiple people
independently debug a weird error for 1-2 hours before they all
simultaneously arrive in the team chat and ask, “Is anyone else seeing
this error?”, only for the offending committer to say, “Oh yeah, you
need to run XXX”. 5 minutes of writing up an announcement can save hours
of wasted time.</p>
<p>The real pro move is to use tools that transparently and
automatically install and use the currently checked-in configuration, to
eliminate this entire class of bugs.</p>
<h3
id="you-get-a-papercut-you-get-a-papercut-everybody-gets-a-papercut">You
get a papercut, you get a papercut, everybody gets a papercut!</h3>
<blockquote>
<p>Your effort is spent in equal parts writing the actual code, and
packaging the code in a way that satisfies all of the automated linters,
test coverage requirements, links to tickets, and other blocking
requirements.</p>
</blockquote>
<p>Scrum masters aside, engineers are a frequent source of their own
bureaucratic slowdowns. Here’s how that might happen. Let’s say you want
to enable a new linter rule, which will cause a hundred new lint errors
to start appearing throughout the codebase. Instead of doing the boring
work of fixing all hundred errors concomitantly with the linter
configuration change, a tempting option is to use the “hold-the-line”
feature of some linters, allowing the new linter rule to go through, but
only enforcing the lint errors once somebody (else) touches the
offending code. This is a terrible idea.</p>
<p>What could have been 30 minutes of one engineer’s time, now becomes
something like 10 to 100 engineers * 5 minutes of time (due to context
switching costs) - a tremendous waste of time. Centralizing the work has
three main benefits: it is globally efficient for one engineer to do it,
it creates economies of scale (maybe you figure out a clever regex to
fix it all at once), and it puts the burden of proof on the right person
- if you don’t think the changes are worth your personal effort, then
why would you distribute that burden onto everyone else?</p>
<p>I’ve been pleased with the industry transition from linters that nag
you about formatting issues, to formatters that automatically fix those
formatting issues. The latter requires more up front investment but
saves time in the long run. More engineers should try to adopt this
mindset.</p>
<h3 id="sorting-the-bookshelf-by-color">Sorting the Bookshelf by
Color</h3>
<blockquote>
<p>You encounter significant merging/rebasing costs when trying to merge
your work into a quickly-moving main branch.</p>
</blockquote>
<p>This one is a fundamentally hard problem - with <span
class="math inline">\(N\)</span> engineers working closely together,
there are <span class="math inline">\(O(N^2)\)</span> opportunities to
step on each others’ toes. One common toe-stepping maneuver is
refactoring - renaming modules, renaming variables/classes, moving
attributes/functions/classes around, regrouping code, or even fixing
whitespace. Because refactoring touches a small number of lines of code
across many files, merge conflicts are inevitable.</p>
<p>Refactoring the codebase has benefits: it compresses the mental map
needed to understand how the codebase works. However, it also has costs:
people have to relearn their mental map. A needless refactor is like
sorting a bookshelf by color - unnecessary, annoying, and
productivity-destroying. So the first rule of refactoring is Don’t
Refactor. Try to get your formatting and naming right the first time.
(Related discussion: <a href="/essays/noutils">better naming for
utils</a>)</p>
<p>The second rule of refactoring is: don’t mix refactors with feature
changes. Refactoring changes are 10-100x easier to review than normal
feature-adding changes. This is also true for every engineer who must
resolve merge conflicts by applying the refactoring rule to their own
code. If you mix refactors with feature changes, what happens is that
the fast-path to understanding and applying the changes is no longer a
valid shortcut! This is aggravating to everyone involved.</p>
<p>At larger scales, managing refactors requires a new set of tools and
approaches; search for “Rosie” in the <a
href="https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext#FNF">Google
monorepo whitepaper</a> for a sketch of the complexities involved.</p>
<h3 id="a-leap-of-faith">A Leap of Faith</h3>
<blockquote>
<p>Your continuous integration pipeline routinely flakes, and the
recommended workaround is simply “try running it again”. People insist
on carefully reviewing every change, because they don’t trust that CI
will catch bugs.</p>
</blockquote>
<p>CI needs active maintenance. Flaky tests build up, causing CI to have
to rerun multiple times before passing; the number of untested workflows
(and accidental breakage) increases; CI runtime only ever seems to
increase. Eventually, what happens is that people stop trusting their
CI. People are desensitized to the frequent “CI Is Broken at HEAD!”
automated pings. Code review slows to a grind; since CI cannot be
trusted, the most senior engineers become reviewing bottlenecks as they
are the only one who can anticipate whether a change is safe to merge.
Every change needs a customized set of manual tests to demonstrate
correctness.</p>
<p>I prefer the “leap of faith” strategy: just pretend that your CI can
be trusted, even if you don’t think it should be! Then, when you
inevitably merge broken code - figure out what sort of test would have
caught your mistake, and add it! You may think that breaking the main
branch is expensive, but it is equally expensive to suffer through a
worry-laden code review process. Having a quick rollback procedure is a
good way to mitigate accidental breakage, and further enables this
strategy.</p>
<h2 id="conclusion">Conclusion</h2>
<p>As teams scale in size, coordination headwinds and Conway’s law are
inevitable. Ultimately, the solution is to embrace Conway’s law, and
shard the codebase along organizational lines, to reduce the <span
class="math inline">\(O(N^2)\)</span> cost of coordinating many peoples’
work. Still, a closely knit working group is more powerful than smaller
independent groups, if they can manage not to step on each others’ toes.
By identifying and solving these coordination issues, individual teams
can forestall the inevitable Conway sharding.</p>
Simplifying Fluffy Constructors in Unit Tests2023-09-23T00:00:00Z2023-09-23T00:00:00Ztag:www.moderndescartes.com,2023-09-23:/essays/simplifying_fluff
<p> Originally posted 2023-09-23</p>
<p> Tagged: <a href="/essays/tags/software_engineering">software_engineering</a></p>
<hr />
<p>The archetypal unit test looks like this:</p>
<pre><code>arg1 = ...
arg2 = ...
expected_output = ...
actual_output = function_to_test(arg1, arg2)
assertEqual(expected_output, actual_output)</code></pre>
<p>A very common problem is that, over time, objects accumulate fields
and subobjects, until it takes significant effort just to construct an
object. Constructing <code>arg1</code>, <code>arg2</code>, and
<code>expected_output</code> can take hundreds of lines, while the
function call and the assertion are just two lines. These tests are like
cotton candy: a tremendous amount of fluff with a tiny core. Well, at
least cotton candy is tasty. This fluff is tedious to write, tedious to
review, and tedious to scroll through, which leads to less unit testing
than is optimal. It’s like chatting with that overly friendly downstairs
neighbor who takes thirty minutes to tell you that the condo insurance
is up for renewal.</p>
<p>The most common coping mechanism for fluffy constructors is the
singleton: one example object that feeds into every test. Often, this
singleton ends up in the setUp() method shared by all tests. The many
fields of the shared singleton are pinned by various different unit
tests’ assertions, and gradually it becomes impossible to either
customize the object, or to add new unit tests. When the test class
reaches this point, the process starts all over with a new freshly made
singleton object and a new test class. This seems a little bit silly.
But how can we do better?</p>
<h2 id="factory-methods-hide-fluff">Factory methods hide fluff</h2>
<p>The first step towards simplifying fluffy tests is to decide which
details are relevant.</p>
<p>Take this test:</p>
<pre><code>car1 = Vehicle(
mass_kg=2000,
location=Location(x_m=0, y_m=0),
velocity=Velocity(x_m_s=4, y_m_s=3),
heading=math.atan2(3, 4),
width_m=1.8,
length_m=4.0,
emergency_vehicle=False,
)
car2 = Vehicle(
mass_kg=2000,
location=Location(x_m=4, y_m=-2),
velocity=Velocity(x_m_s=0, y_m_s=5),
heading=math.atan2(5, 0),
width_m=1.8,
length_m=4.0,
emergency_vehicle=False,
)
self.assert(car1.speed_m_s) = 5
self.assert(car2.speed_m_s) = 5
self.assertTrue(willColideWithin5sec(car1, car2))</code></pre>
<p>Many of these fields are irrelevant, so we may as well hide them
behind a factory method that sets sensible defaults.</p>
<pre><code>car1 = make_suv(
location=Location(x_m=0, y_m=0),
velocity=Velocity(x_m_s=4, y_m_s=3),
)
car2 = make_suv(
location=Location(x_m=4, y_m=-2),
velocity=Velocity(x_m_s=0, y_m_s=5),
)
self.assert(car1.speed_m_s) = 5
self.assert(car2.speed_m_s) = 5
self.assertTrue(willColideWithin5sec(car1, car2))</code></pre>
<p>You might object that factory methods just hide the fluff. It’s true
that if you only have one unit test, this new solution is the same
number of lines of code. But as the marginal cost of testing drops,
you’ll get more tests. It’s also easier to manually verify that the unit
test is correct.</p>
<h2 id="dsls-hide-syntactic-fluff">DSLs hide syntactic fluff</h2>
<p>In certain cases, the fluff is due to language syntax itself! You
might think it isn’t possible to eliminate this type of fluff, but
writing your own DSL is a powerful technique to do just that.</p>
<p>Which would you rather see?</p>
<pre><code>go_board = np.array([
[go.KO,1,1,0,0,0,0,0,0],
[1,-1,0,0,0,0,0,0,0],
[-1,0,-1,0,0,0,0,0,0],
[0,0,0,0,0,0,0,0,0]
[0,0,0,0,0,0,0,0,0]
[0,0,0,0,0,0,0,0,0]
[0,0,0,0,0,0,0,0,0]
[0,0,0,0,0,0,0,0,0]
[0,0,0,0,0,0,0,0,0]
[0,0,0,0,0,0,0,0,0]
])</code></pre>
<p>or</p>
<pre><code>go_board = parse_board('''
*XX......
XO.......
O.O......
.........
.........
.........
.........
.........
.........
''')</code></pre>
<p>The latter contains far less visual noise, with half the total
characters. It features a sensible null character, flexible whitespace
for convenient embedding of inline data, and monospaced content.
Definitely easier to read and write.</p>
<p>In many cases, you can reuse existing DSLs instead of having to
create your own. This lets you even skip writing the parser - you can
use a library for that. Instead of manually constructing a Pandas
dataframe, why not just embed and parse a .csv? Instead of manually
constructing a giant config object, why not parse YAML? Instead of
manually constructing nodes and an adjacency graph, why not parse <a
href="https://en.wikipedia.org/wiki/DOT_(graph_description_language)">DOT</a>?
Possibly the most obscure DSL I’ve ever written is for <a
href="https://github.com/open-reaction-database/ord-schema/blob/9c9e852d5e1b5680d6545eafeca1ccf47d87b641/ord_schema/macros/workups.py">organic
chemistry reaction workups</a>!</p>
<h2 id="conclusion">Conclusion</h2>
<p>To defluff is to be human. We describe weather as sunny, cloudy, or
rainy without having to specify temperature, humidity, cloud cover, or
wind conditions. If a concept has been around for more than a decade,
chances are, a very compact DSL already exists for it, and you won’t
have to invent a new one.</p>
<p>Fluffy unit tests are annoying to read and write, but more than that,
they discourage writing more unit tests. By investing in methods to
defluff object constructors, it becomes a lot easier to write
comprehensive unit test suites, and the unit tests become far easier to
manually verify. As a bonus, factory methods and DSLs often end up quite
useful outside of writing test cases, too - they make it easier to write
tutorial notebooks or to construct ad-hoc objects during a debugging
session.</p>
Optimal Bureaucracy2023-09-12T00:00:00Z2023-09-12T00:00:00Ztag:www.moderndescartes.com,2023-09-12:/essays/optimal_bureaucracy
<p> Originally posted 2023-09-12</p>
<p> Tagged: <a href="/essays/tags/math">math</a>, <a href="/essays/tags/software_engineering">software_engineering</a>, <a href="/essays/tags/system_dynamics">system_dynamics</a></p>
<hr />
<p>This past week, I ended up annoyed with continuous integration
systems. In short: the CI consisted of three stages in the following
order: basic linters/formatters (10 minutes, 90% pass rate), unit tests
(25 minutes, 95% pass rate), and SonarQube (5 minutes, 95% pass rate).
You would think linters/unit tests should have 100% pass rate, but hey,
developers are lazy and they push minor edits and hit the merge button,
without bothering to run the tests locally first. I happened to run into
a SonarQube issue (and it wasn’t locally reproducible), so I had to wait
for this long pipeline to iterate on the fix. This was frustrating and I
had the vague intuition that SonarQube <em>shouldn’t be at the end of
the CI pipeline</em>. But was that actually true, or was I just unlucky
that it was the last step that was failing?</p>
<p>It turns out that this type of question occurs in many different
settings.</p>
<ul>
<li>In drug development, drugs go through multiple optimization stages -
first, the molecular structure is optimized for drug activity; then it
is tweaked for <a href="https://en.wikipedia.org/wiki/ADME">ADME
properties</a>; then it’s tested in animals for toxicity; then it goes
through a series of clinical trials. Is this the right ordering?</li>
<li>At Google, project launches had to go through multiple reviews:
security review, legal, regulatory, SRE, product, (and probably more I’m
forgetting). One typical failure mode of this process is that the
engineering is done up front and then engineers get annoyed and push
back when a reviewer tells them that their product is fundamentally
illegal/insecure/doesn’t fit with the product portfolio. Should the
engineers have instead worked to satisfy some subset of the reviewers
before even starting on their work?</li>
<li>You’re trying to figure out where a group of family/friends will get
together for a big reunion. Everybody has their own constraints and
preferences for where/when/how this should happen. In what order should
you check your suggested plan with everyone?</li>
<li>Any <a
href="https://en.wikipedia.org/wiki/Waterfall_model">waterfall</a>
project has to choose the order in which they address requirements
and/or stakeholders.</li>
</ul>
<p>Let’s formalize the CI question to a more general setting and solve
that problem.</p>
<h2 id="the-math">The Math</h2>
<p>Here’s how you might formalize this problem.</p>
<p>You have <span class="math inline">\(n\)</span> requirements, each
with some cost <span class="math inline">\(C_i\)</span> and probability
of success <span class="math inline">\(P_i\)</span>. The probabilities
are all independent. You must complete all requirements in sequence; if
any requirement fails you must start over from scratch. What ordering of
requirements will minimize the total expected cost of the process?</p>
<h3 id="solution">Solution</h3>
<p>Let <span class="math inline">\(E_i\)</span> indicate the cost up to
some step <span class="math inline">\(i\)</span>, then</p>
<p><span class="math display">\[E_i = \frac{E_{i-1} + C_i}{P_i}\]</span>
<span class="math display">\[E_n = \frac{C_1}{P_1P_2P_3\ldots P_n} +
\frac{C_2}{P_2P_3\ldots P_n} + \frac{C_3}{P_3\ldots P_n} + \ldots +
\frac{C_n}{P_n}\]</span></p>
<p>We could brute force over all <span class="math inline">\(n!\)</span>
possible orderings and pick the one with lowest cost - but hey, if this
were the best solution available, I wouldn’t be writing about it :D</p>
<p>What would a more elegant solution look like? Some wishful thinking
says that if we found some function <span class="math inline">\(F(C_i,
P_i)\)</span>, and sorted the tasks by this function, then we could
achieve <span class="math inline">\(O(n \log n)\)</span>.</p>
<p>If we try solving the case with <span
class="math inline">\(n=2\)</span>, we’ll find that we end up with a
candidate function <span class="math inline">\(F =
\frac{C_i}{1-P_i}\)</span>. (I use <span
class="math inline">\(\stackrel{?}{<}\)</span> to denote unknown
ordering)</p>
<p><span class="math display">\[
\begin{align*}
\frac{C_1}{P_1P_2} + \frac{C_2}{P_2} \stackrel{?}{<}&
\frac{C_2}{P_1P_2} + \frac{C_1}{P_1} \\
C_1\frac{1-P_2}{P_1P_2} \stackrel{?}{<}& C_2\frac{1-P_1}{P_1P_2}
\\
\frac{C_1}{1 - P_1} \stackrel{?}{<}& \frac{C_2}{1-P_2}
\end{align*}
\]</span></p>
<p>The implication is that the ordering <span class="math inline">\(1,
2\)</span> is more efficient than <span class="math inline">\(2,
1\)</span> only if <span class="math inline">\(F(1) <
F(2)\)</span>.</p>
<p>Does sorting by <span class="math inline">\(F\)</span> yield an
optimal solution? Yes. Consider any two adjacent steps <span
class="math inline">\(i\)</span> and <span
class="math inline">\(i+1\)</span>. The cost up through <span
class="math inline">\(i-1\)</span> does not depend on <span
class="math inline">\(i, i+1\)</span>, and the specific ordering of
<span class="math inline">\((i, i+1)\)</span> vs <span
class="math inline">\((i+1, i)\)</span> does not constrain steps <span
class="math inline">\(i+2...\)</span> in any way. So we are free to
optimize the ordering of <span class="math inline">\(i\)</span> and
<span class="math inline">\(i+1\)</span> without regard to the rest of
the sequence. The optimal ordering of <span
class="math inline">\(i\)</span> and <span
class="math inline">\(i+1\)</span> turns out to be identical to the
solved case of <span class="math inline">\(n=2\)</span> - sort by <span
class="math inline">\(F\)</span>!. Any order inversions can be made more
efficient by flipping the two elements. Writ large, this implies that
you can bubble sort your list.</p>
<p>The conclusion: optimal ordering is accomplished by sorting
requirements by <span class="math inline">\(F(i) = \frac{C_i}{1 -
P_i}\)</span>.</p>
<h2 id="optimizing-ci">Optimizing CI</h2>
<p>Recall that we had basic linters/formatters (10 minutes, 90% pass
rate), unit tests (25 minutes, 95% pass rate), and SonarQube (5 minutes,
95% pass rate). The function F for these three stages evaluates to 100,
500, and 100. So we can say that SonarQube should always happen before
unit tests, and is tied with the linters/formatters. However, in the
scenario where SonarQube is already failing, then the probability of
passing it with your fixes is something lower, perhaps 50-75%. In this
scenario, SonarQube clearly belongs at the start of the CI pipeline,
with a score of 10-20.</p>
<p>In this toy example, I use time as a cost metric, but it can of
course encompass compute or SaaS costs as well.</p>
<p>I’d love to see CI platforms allow the flexibility to reorder stages
according to which one failed most recently, or to have more intelligent
ordering according to empirically observed failure rates and durations.
One could argue that SonarQube should be made available locally, but
there are many valid CI use cases that can’t be run locally - for
example TensorFlow’s CI used to run against {Windows, Linux} X {CPU,
GPU, TPU} targets, with corresponding hardware/OS maintenance
requirements. (This CI burden is the primary reason why TensorFlow <a
href="https://discuss.tensorflow.org/t/2-10-last-version-to-support-native-windows-gpu/12404">dropped
native Windows support</a>!) As it is, most CI configurations are done
by hand and don’t ever change.</p>
<h2 id="acknowledgments">Acknowledgments</h2>
<p>Thanks to <a
href="https://www.linkedin.com/in/jay-leeds-6a588919a/">Jay Leeds</a>
for providing a solution and alerting me that a close variation of this
problem was posed in <a
href="https://www.youtube.com/watch?v=c_ilfGOnBtE">2023’s ICPC NAC</a>.
(It seems the author of that problem was similarly frustrated with their
CI!)</p>
Better ways to name your utils module2023-09-06T00:00:00Z2023-09-06T00:00:00Ztag:www.moderndescartes.com,2023-09-06:/essays/noutils
<p> Originally posted 2023-09-06</p>
<p> Tagged: <a href="/essays/tags/software_engineering">software_engineering</a></p>
<hr />
<p>As the joke goes, the two hardest problems in computer science are 1)
cache invalidation and 2) naming. Like all good jokes, there’s a kernel
of truth in there. Naming is poetry; the perfect name has both precise
meaning but also precise connotation. Naming is also mathematics; the
judicious choice of which concepts deserve a name can trivialize a
problem. In software engineering terms, good naming is equivalent to
good factoring of the problem domain into the right abstractions and
APIs.</p>
<p>A utils.py module is a failure of naming. Let’s talk about ways we
can improve the situation.</p>
<h2 id="utils.py-considered-harmful"><code>utils.py</code> Considered
Harmful</h2>
<p>see also: <code>helper.py</code>, <code>misc.py</code></p>
<p>First, a quick rundown: why exactly are utility modules considered
harmful?</p>
<p>At a philosophical level: code should make its intent clear. Nobody
would let a function name like <code>do_stuff()</code> pass code review.
So why tolerate an equally ambiguous module name?</p>
<p>At a practical level: utility modules tend to accumulate
dependencies, causing everything to depend on everything via the utils
bottleneck. It’s a great breeding ground for inadvertent circular
dependencies.</p>
<p>At a social level: the existence of one module named
<code>utils.py</code> implicitly grants permission to create more,
leading up to that doleful moment when you have to resolve a name
collision between two or more <code>utils</code> modules.</p>
<h3 id="not-considered-harmful-foo_utils.py">Not Considered Harmful:
foo_utils.py</h3>
<p>While the ideal codebase should not have any <code>utils.py</code> in
it, a pragmatic compromise is to categorize your utility code.
<code>foo_utils.py</code> demonstrates intention: it is about
<code>foo</code>, but more importantly, it is not about stuff that is
not <code>foo</code>. <code>utils/foo.py</code> is also okay.</p>
<h2 id="sorting-your-utilities">Sorting your utilities</h2>
<p>I’ve seen many flavors of utility code which could be easily sorted
into more appropriate categories. See if any of the following categories
match your code:</p>
<p><code>$PLATFORM_utils</code> - hacks, workarounds, and codified usage
patterns for a platform’s deficiencies and inconveniences. Retry/backoff
logic? Concurrency and consistency workarounds? Missing primitives?
Auth? Environment management?</p>
<p><code>testing_utils</code> - make the testing process easier
(randomness, parametrization, fuzzing, customized assertions, etc.). Do
not put test fixtures or mocks here! Those belong alongside the unit
tests that consume them. There are lots of great libraries out there for
making your tests better - <code>mock</code>, <code>parametrized</code>,
<code>hypothesis</code>, to name a few.</p>
<p><code>$DOMAIN_SPECIFIC_CONCEPT</code> - Your domain probably has some
domain-specific concepts that are not obvious to outsiders. Middleware?
Augmentation? Symmetries? If you’re relatively inexperienced in the
problem domain, you may not realize these concepts exist, and reinvent
them poorly in the utils module. Read other OSS codebases, papers, or
books to learn what these concepts are. A special-shoutout goes to
parsers and compilers, which people reinvent badly on a regular
basis.</p>
<p><code>base</code> - Foundational data types, definitions, and
concepts that are used pervasively throughout the codebase. This will
get imported everywhere, so keep a strict watch on its dependencies.</p>
<p><code>$SYSTEM_client</code>: When data generated by one system is
consumed by another, and their APIs don’t quite align, then some adaptor
code is needed to munge the data formats. If both systems are under your
control, you should figure out a better API. If one or both of those
systems is from a third party provider, then you have no choice but to
write adaptor code. As a company grows, it’s pragmatic for teams to
start treating each other as third parties, depending on org chart
distance.</p>
<p><code>visualizations</code>: often used interactively and tends to
invoke libraries with unique GUI or system dependencies that don’t work
on CI or other headless deployments.</p>
<p>single-use code: Code that only has one caller. This often makes its
way into utility modules in an attempt to hide ugly code. You should
keep this code right next to its caller, or inline the code. Nobody is
being fooled by the indirection.</p>
<h2 id="codebase-maintainers-do-you-have-a-utils-problem">Codebase
maintainers: Do you have a utils problem?</h2>
<p>Try measuring the percentage of code that lives in “utils” modules.
You can accomplish this by running <code>cloc</code> on your codebase,
and then running
<code>find . -name "utils.py" | cloc --list-file=-</code> to get
util-specific metrics.</p>
<p>Broken down by percent of code in utils modules:</p>
<ul>
<li>0-2%: Healthy.</li>
<li>2-10%: Unhealthy. Share this essay with your team and discourage
further additions to utils.py</li>
<li>10+%: Morbid. Your codebase – or possibly your management – needs an
intervention of some sort.</li>
</ul>
<p>If you have a healthy amount of code in utils modules,
congratulations! I’d suggest writing up the <code>cloc</code> commands
as a script to monitor regression. Of course, beware Goodheart’s law,
and don’t hold anyone to this metric!</p>
Blogging FAQs2023-07-11T00:00:00Z2023-07-11T00:00:00Ztag:www.moderndescartes.com,2023-07-11:/essays/blogging_faq
<p> Originally posted 2023-07-11</p>
<p> Tagged: <a href="/essays/tags/personal">personal</a></p>
<hr />
<h2 id="how-long-does-it-take-me-to-write-an-essay">How long does it
take me to write an essay?</h2>
<p>Definitely the most FAQ.</p>
<p>I spend anywhere from 5-100 hours spread over 1-12 calendar weeks.
“<a href="/essays/why_brain">Why did Google Brain exist?</a>” and “<a
href="/essays/data_oriented_python">Data Oriented Programming in
Python</a>” took on the longer side, and “<a
href="/essays/research_code">A Research Codebase Manifesto</a>” took on
the shorter side (but only because I’d already done so much thinking on
the topic). This FAQ took 2 hours in two sittings.</p>
<p>There is no clear separation between ideation/writing/editing; I
think by writing. I’ve deleted as much as 80% of my writing before
publication. Most of my essays are based on insights that I’ve already
vaguely understood from the work I’ve done over the years. The writing
time goes into isolating the insight, testing its limits, and finding
the clearest way to present it.</p>
<p>This is a lot of time to spend per essay. I think it just reflects my
personal expectations on how thoroughly I want to understand an idea
before moving on. My expectations have risen over time, probably more
quickly than my ability to meet my own expectations, leading to overall
fewer essays and more time spent per essay. Still, as long as I manage
to publish a few essays a year, I’m happy with my output. I wouldn’t
want to publish anything I didn’t feel proud of.</p>
<h2 id="why-do-you-write">Why do you write?</h2>
<p>There are three main reasons. First, I like thinking, and writing is
the best way to think critically and honestly. Second, I like teaching,
and writing is a great way to pass on wisdom. Third, it gets me street
cred. It’s like a resumé, but one that people actually enjoy reading. It
opens doors to conversations that wouldn’t otherwise happen.</p>
<h2 id="how-do-i-get-started-writing">How do I get started writing?</h2>
<p>Write. It’s really that simple.</p>
<p>It doesn’t matter what platform or tech you use. The set of people
who talk about the right way to start a blog will have stopped blogging
within a year on average, and it is precisely these people who seem to
be drawn to arguing about whether Medium or Substack is a better place
to blog. Ignore them. In ten years, people will still be reading my blog
and as they stumble on this FAQ, they will wonder what Substack was.</p>
<p>My only recommendation is to (eventually) own your content in a
platform-neutral format, so that you can migrate platforms at will. I
started on Blogger.com; moved to a personal domain (a django site hosted
by… Hostgator, IIRC?), then <a href="/essays/gcs_static">rewrote it as a
static site generator</a> hosted on Google Cloud Storage. The latest
iteration is a static site hosted on Firebase, for its SSL support. You
can see the <a
href="https://github.com/brilee/modern-descartes-v2">static site
generator and raw essay files on GitHub.</a>.</p>
<h2 id="what-should-i-write-about">What should I write about?</h2>
<p>Write what you know and have experienced. The world is a better place
when you delete your hot take on a flamebait topic where you have no
particular expertise.</p>
<p>For many of us, this means job experience. As a result, you might
feel like you would be violating confidentiality or that you might be
offending your employer by writing about your work. Your gut instinct is
probably correct. So, instead of writing about company strategy, write
about the general strategic landscape that would enable someone to
understand the company strategy (if they had access to it). Instead of
complaining about your employer, think through how the situation might
be improved. It’s more positive-sum, lets you roleplay your boss’s job,
and helps you level up your soft skills. Finally, you can write on a
time delay. “<a href="/essays/deep_learning_emr">Deep Learning on EMRs
is Doomed to Fail</a>” was an essay I could have written in 2017, but at
that time it would have been nonobvious and definitely confidential. In
2022, it was something that everybody in the field already knew, and
having quit Verily in 2018, I was able to say it out loud, since there
was no chance it would be mistaken as an official company stance.</p>
<p>Not all of my writing is public. I’ve written well-reviewed
documentation at Google on TensorFlow and ML readability, and was
fortunate enough to have some of it published as <a
href="https://www.tensorflow.org/guide/function">official TensorFlow
guides</a>.</p>
<h2 id="how-do-i-get-better-at-writing">How do I get better at
writing?</h2>
<p>Write more. Be intentional about the editing process. Read your blog
posts out loud. Try to explain the idea to a friend. Ask a more
experienced writer to give you feedback. Decide who your target audience
is and decide what their level of background is. (I target a relatively
sophisticated audience because it’s honestly tedious to write too much
introductory material.)</p>
<p>My early essays are bad. I look back at them and think, “wow, I have
no idea what this guy is trying to say”. I do occasionally go back and
re-edit them, and I have even deleted some that I deemed unsalvageable.
It’s now been over ten years since I started writing, and my writing has
improved tremendously. Yours will too.</p>
<p>The good news is that at first, nobody will read your blog, so
there’s really nothing to be ashamed of by publishing your writing.</p>
<p>To get people to read your blog, share your work on Reddit, Facebook,
Linkedin, HN, wherever you can get eyeballs on your work. It’ll probably
disappear into the noise; sometimes you’ll get <a
href="https://www.reddit.com/r/programmingcirclejerk/comments/14re3ky/readability_googles_temple_to_engineering/">snarky
comments on how bad your writing is and how horrible of a person you
must be</a>. I used to get upset when I saw those comments. Nowadays, I
endorse the “u mad bro?” school of philosophy.</p>
<h2 id="have-more-questions">Have more questions?</h2>
<p>Email me.</p>
Readability: Google's Temple to Engineering Excellence2023-07-03T00:00:00Z2023-07-03T00:00:00Ztag:www.moderndescartes.com,2023-07-03:/essays/readability
<p> Originally posted 2023-07-03</p>
<p> Tagged: <a href="/essays/tags/software_engineering">software_engineering</a>, <a href="/essays/tags/management">management</a>, <a href="/essays/tags/popular">popular</a></p>
<hr />
<p>When reflecting on my six years at Google, its readability process
stands out as unique within the tech landscape.</p>
<p>As a readability mentor, I’ve reviewed roughly ~100,000 lines of
Python code at Google, written by hundreds of different authors. In
doing this, I am one of thousands at Google who collectively have
shepherded hundreds of thousands of Googlers through the readability
process. The sheer scale of this program has shaped the entire tech
industry’s conception of “idiomatic Python/Java/C++/Go”.</p>
<p>I want to discuss what readability is, how it affects Googlers
(myself and others), its cultural significance within Google, and
whether it makes sense to recreate it outside of Google’s walls.</p>
<h2 id="readability-at-google">Readability at Google</h2>
<p>At Google, every change is required to have one approval from a
maintainer of that corner of the codebase. Most companies do this -
nothing strange here. However, uniquely to Google, every change is
<em>also</em> required to have one approval from somebody who “has
readability”. Having readability means that you know the language’s ins
and outs, design patterns, ecosystem of libraries, and idiomatic usage
at Google, and are thereby trusted to catch any issues in language
usage. The readability requirement is satisfied if either the author or
any reviewer has readability. I would estimate that about a third to a
half of Googlers have readability in their primary work language.</p>
<p>To get readability, you submit code you’ve written, and a readability
mentor is randomly drawn from a pool to review your code. You are
encouraged to read and follow the relevant style guides to avoid trivial
back-and-forth. After writing enough “good” code in that language, you
are granted readability.</p>
<p>You can learn more about Readability in the <a
href="https://abseil.io/resources/swe-book/html/ch03.html#readability_standardized_mentorship_thr">SWE
Book</a></p>
<h2 id="readabilitys-impact-on-individual-googlers">Readability’s impact
on individual Googlers</h2>
<p>To many Nooglers, Readability’s enshrinement at the very core of the
code submission mechanism seems like unnecessary bureaucracy. To many
veteran Googlers, readability <em>still</em> seems like unnecessary
bureaucracy. You can choose not to get readability (totally allowed!),
but if too few people on your team have readability, then the team’s
work can grind to a crawl when those reviewers go on vacation. Googlers
undertaking their readability pilgrimage will statistically encounter
that one readability reviewer who takes pride in their prowess as a
human code linter, or has a vendetta against some language feature.
These are just some of the many valid reasons to dislike readability at
Google.</p>
<p>Still, for every person who dislikes the readability process, there
are many others who have been helped. Many of my coworkers have become
much better engineers by making the readability pilgrimage, and I hope
that the hundreds of diffs I’ve reviewed have directly improved the
codebase at Google. Readability’s systematic influence on the Google
codebase also leads to a more consistent baseline level of code quality,
relative to other companies of a similar size.</p>
<h3 id="readability-and-me">Readability and Me</h3>
<p>My own readability pilgrimage was rough. Early on, I made the mistake
of submitting a minor improvement to a former intern’s research code for
readability progress. The assigned reviewer tore the entire file apart,
not just the changes I’d made! That was an unpleasant and major
unanticipated scope creep, and from what I learned later as a
readability mentor, that negative review probably set me back 5-10
diffs’ of readability progress. Frankly, that early review was a total
waste of both our time, and probably cost Google $5,000 in lost
productivity if you also account for the additional unnecessary
readability reviews I had to undergo.</p>
<p>Later, as a readability mentor, I got to enjoy reading regular rants
on the mentor-private mailing list about various language features. My
work on AutoGraph’s AST compiler magic got a special callout, which I’m
actually quite proud of :trollface: (For the record: I was the one
person on the team who tried very hard to product manage the scope of
AutoGraph’s magic to the smallest useful set of composable
transformations. I am quite aware of the usability dangers of magic
language features!) So I definitely got to see all of the ugliness that
the readability process was capable of generating.</p>
<p>Still, I thought that readability, done right, was valuable, and
signed up to be a mentor. At first, it took me well over an hour per
diff reviewed - a glacial pace of one line of code per minute! It is
difficult to read a random diff from a codebase you’ve never seen
before, whose conventions you are unfamiliar with, from a workstream
you’ve never heard of, where the local maintainer has already worked
with the author to bring the code to an acceptable bar of quality - and
try to contribute something useful to the review.</p>
<p>I could have taken the easy way out by picking at nits that the
linter missed. But to me, readability mentorship meant Engineering
Excellence, broadly interpreted. Beyond just style and testing, I
commented on code architecture, maintainability, library usage, systems
design, build-or-buy decisions, and much more. Having experienced the
unexpected scope creep review myself, I knew that I should not ask for
the author to rewrite their entire codebase. I had a patience budget I
was allotted for each review, and I tried to use it as wisely as I
could.</p>
<p>Several months into my service as a readability mentor, I found that
I could review code 10 times faster than when I first started. I could
understand, within minutes, the change’s intent, the context of why the
surrounding codebase might look the way it did, and sometimes even the
author’s career history. (e.g. “You seem to come from a Java background.
This visitor design pattern is more concisely expressed in Python using
custom tree iterators.”) I’ve become a measurably 10x engineer, at least
in this one specific ability to review code.</p>
<h2 id="readabilitys-role-in-google-culture">Readability’s role in
Google culture</h2>
<p>Readability is just the tip of the Google culture iceberg. In the
early days, Craig Silverstein, employee #1 at Google, would <a
href="https://abseil.io/resources/swe-book/html/ch03.html#readability_standardized_mentorship_thr:~:text=In%20Google%E2%80%99s%20early%20days%2C%20Craig%20Silverstein">carefully
and thoroughly review</a> every new hire’s first CL for best practices
and uniform style. I don’t know if Craig anticipated what readability
would become, but it’s safe to say that he and other early Googlers
understood the multiplicative returns of consistent code style, engineer
fungibility, excellent tooling and centralized systems on programmer
productivity.</p>
<p>Today, Google boasts a unified build system, monorepo, bug tracker,
containerization system, database systems, big data systems, protobufs,
and more. Many secondary systems work their magic, like the ability to
<a href="https://abseil.io/resources/swe-book/html/ch22.html">manage
refactoring diffs spanning millions of lines of code</a> across many
files, and the ability to <a
href="https://testing.googleblog.com/2023/04/sensenmann-code-deletion-at-scale.html">detect
and automate deletion of dead code</a>.</p>
<p>Google’s core technical thesis is that global conformity’s benefits
outweigh local inefficiencies. This engineering-centric culture probably
chases away many product-minded, entrepreneurial, and exploratory types,
to its detriment. In the Research org, I saw researchers who chafed at
readability and only tolerated Google systems to the extent that it got
them TPU compute time. On the positive side, Google attracts the world’s
best engineers, and it delivers technically superior results, even as
management pulls boneheaded product moves.</p>
<p>If you accept Google’s core thesis, readability is merely the scaling
mechanism and melting pot by which global conformity is accomplished. I
vividly remember one readability review for some thorny AutoGraph-laden
TensorFlow 2 code written by an engineer in a random not-researchy part
of Google. There were probably only 10 people in the world besides
myself who could have properly reviewed this code, and it happened to
land in my queue. I’m certain that among the readability mentors, there
are many other domain experts who distribute their expertise throughout
Google. The only other Google-wide melting pots I can think of are code
search and LLM-driven codegen tools, but these don’t have the human
touch that readability brings.</p>
<h2 id="should-your-company-implement-readability">Should your company
implement Readability?</h2>
<p>The defining features of readability at Google are</p>
<ol type="1">
<li>consensus on a bar for readability</li>
<li>a process for mentoring engineers until they qualify for
readability.</li>
<li>programmatic enforcement that every change should be authored or
reviewed by someone with readability.</li>
</ol>
<p>The first two criteria are uncontroversial. Many companies have style
guides, and many companies have a single-language codebase with no room
for cultural drift. Many companies also have a tacit expectation that
senior engineers will review junior engineers’ code until they can be
trusted to review each other’s code.</p>
<p>Criterion (3) is the most difficult to implement, both technically
and organizationally. GitHub has a protected branches feature, but it
doesn’t have any way to add concepts like “Python readability”, and I
don’t think GitHub is in any rush to implement readability.
Organizationally, the programmatic enforcement is what causes the most
grumbling, and I could easily see a VP undermining readability by
declaring their upcoming product launch is more important than enforcing
readability.</p>
<p>What benefit would be worth putting a roadblock into the gameplay
loop of an engineer’s workflow? A safety-critical project, perhaps?
Alternatively, maybe you would like to replicate Google’s engineering
culture.</p>
<p>I personally disagree that Google’s global consistency outweighed
local inefficiencies. Apple and Amazon have a reputation for having a
very distinct working experience depending on where you land within the
company, and this is supposed to be bad. Yet, this also means that teams
can move quickly and without consulting the rest of the company. Google
felt like death by a thousand cuts; the larger Google got, the stronger
the pressures to use monolithic products and processes that were not
adaptable to every possible use case. I saw one particular story play
out repeatedly: “We never launched or demoed our research project
because our only deployment option was to go through all the
privacy/regulatory/legal/PR/Pubapprove signoff processes and spend
several months figuring out the TensorFlow serving stack.” Ultimately, I
believe it’s better to shard the company into smaller, more agile
divisions with distinct subcultures adapted to the needs at hand.</p>
<p>So, my answer is that no, companies should not implement Google’s
version of readability. This should be unsurprising; with how many
Xooglers are floating around, we would see more readability outside of
Google if it actually made sense. Simultaneously, Google’s culture has
too much momentum at this point; it should preserve the readability
process and embrace the types of products and problems that its
engineering-centric culture is best suited to.</p>
<h3 id="readability-lite">Readability Lite</h3>
<p>A <a href="https://promys.org/">summer math program</a> I attended in
high school had a phrase: “Prove or disprove and salvage if possible”.
Having disproven readability, I would like to salvage it by proposing
“Readability Lite”, which consists of:</p>
<ol type="1">
<li>consensus on a bar for readability</li>
<li>a process for mentoring engineers until they qualify for
readability.</li>
<li>a non-blocking mechanism to encourage people to get
readability.</li>
</ol>
<p>This variant salvages the mentorship program while hopefully
eliminating most of the grumbling. It differs from informal mentorship
because it creates opportunities to learn from engineers across the
company, not just within your own team, and it creates an organizational
expectation that engineers can and should strive to master their
craft.</p>
<p>For (1), I would suggest the bar should include an understanding of
the memory model of the language; awareness if not a solid grasp of
language solutions to typical tasks (servers/clients,
serialization/deserialization, regex, metaprogramming, arrays, time,
I/O, logging, performance measurement/debugging/optimization),
understanding of the nuances of dependency management, good testing
practices (including how to architect for easy testability), and some
understanding of why the company’s technical choices suits the company’s
technical/product requirements. Not all “typical tasks” will be
applicable to every company; pick and choose as appropriate! For (2),
you would want to find and incentivize senior/staff+ engineers who want
to mentor others, possibly through some sort of citizenship expectation
in performance evaluations. For (3), I think the simplest solution is to
make readability a requirement for promotion to senior engineer. [cue
flamewar on whether my bar is too high or too low for “senior”
engineers]</p>
<p>Implementation-wise, it’s best to start early. It’s difficult to
bootstrap readability past, say, 100 engineers, because you’d need to
get >20 senior engineers to agree on a bar for readability. If you
assume a quarter of the company has readability (1:3 senior:junior
ratio), that the company is growing at 20% YoY (+25% new hires -5%
attrition), then ~6% of the company’s engineers will need readability
every year. Assuming one readability mentor can mentor 3-5 or so people
to readability every year, 1-2% of all engineers, or about 5-10% of
senior engineers need to be readability mentors.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I learned a great deal, both on the receiving and giving end of
Google’s readability process, and I’m really grateful that Google
invested so much in growing its engineering talent. I understand why
nobody else would want to implement Google’s readability, but I would
love to see more Readability Lite in the world. Please let me know if
you try this out at your company and how it turns out.</p>
Why did Google Brain exist?2023-04-26T00:00:00Z2023-04-26T00:00:00Ztag:www.moderndescartes.com,2023-04-26:/essays/why_brain
<p> Originally posted 2023-04-26</p>
<p> Tagged: <a href="/essays/tags/machine_learning">machine_learning</a>, <a href="/essays/tags/system_dynamics">system_dynamics</a>, <a href="/essays/tags/popular">popular</a></p>
<hr />
<p><em><a
href="https://www.golem.de/news/brain-deepmind-hinter-den-kulissen-von-googles-ki-forschung-2306-174989.html">Auf
Deutsch lesen</a></em></p>
<p>This essay was originally written in December 2022 as I pondered the
future of my job. I sat on it because I wasn’t sure of the optics of
posting such an essay while employed by Google Brain. But then Google
made my decision easier by laying me off in January. My severance check
cleared, and last week, Brain and DeepMind merged into one new unit,
killing the Brain brand in favor of “<a
href="https://blog.google/technology/ai/april-ai-update/">Google
DeepMind</a>”. As somebody with a unique perspective and the unique
freedom to share it, I hope I can shed some light on the question of
Brain’s existence. I’ll lay out the many reasons for Brain’s existence
and assess their continued validity in today’s economic conditions.</p>
<h2 id="the-industry-research-lab">The Industry Research Lab</h2>
<p>I want to start by precisely describing the paradox that needs to be
explained.</p>
<p>Academics have always faced the dilemma of research freedom in
academia versus higher pay in industry. It’s not surprising that as a
machine learning expert, Google will pay you handsome sums to do ML. The
tradeoff is usually that you have to work on recommender systems, ad
optimization, search ranking, etc., instead of pure research.</p>
<p>To be clear, Brain hosts many researchers and projects, many of which
are directly or indirectly profitable. For example, many researchers
focus on improving optimizers, architecture search, and hyperparameter
search. This research is directly profitable, as it lowers compute cost
to achieve a given level of performance. I don’t think this needs any
further explanation.</p>
<p>What needs explanation is why Google Brain (alongside DeepMind,
OpenAI, FAIR, and others) funds hundreds of ML researchers to work on
pure research, seemingly just for research’s sake, while still
compensating an order of magnitude more than academia would. For
example, my team worked on <a href="/essays/rgb_odor/">machine learning
for olfaction</a>. What is Google doing, funding research on smell?
What’s the catch? This is the question I would like to answer.</p>
<h2 id="prestige">Prestige</h2>
<p>Most academics assume that Brain is angling for prestige: “Brain is
in a bidding war with other industrial research labs to hire the best
researchers, so that they can be the most prestigious research group,
which will in turn help them hire the best researchers”. After all, this
is how academia in the U.S. works: with a trinity of funding,
students/postdocs, and principal investigators (PIs). In principle,
funding goes to the most talented PIs and students/postdocs;
students/postdocs go where the most talented PIs and funding are; PIs go
where they can find talented students/postdocs and funding.</p>
<p>Universities are directly incentivized to maximize prestige, as they
take a (<a
href="https://austinhenley.com/blog/grantbudget.html">surprisingly
large</a>) cut of all research funding. Industry research labs don’t
have the same incentive structure. Rather than profiting from
maintaining a prestigious lab, it ends up costing more to keep top
researchers from defecting. Uber AI Labs seemed to exist solely for
prestige (ego?) reasons and was duly canceled by Dara “Adult
Supervision” Khosrowshahi after he took over from Uber founder Travis
Kalanick.</p>
<p>Prestige confers two main effects: a positive brand image in the
consumer space, and easier hiring, both within pure research and in
applied ML. For example, I hadn’t even considered applying to Apple
during my <a href="/essays/my_ml_path">job hunt</a> several years ago,
due to their lack of ML presence! Perhaps Apple didn’t recruit machine
learning experts precisely because they did not need machine learning
experts - a <a
href="https://blog.pragmaticengineer.com/apple-job-cuts-tide/">sensible
decision in line with Apple’s growth philosophy</a>. But if you do need
to hire several thousand ML engineers, it makes sense to fund a handful
of top ML researchers as a prestige play. I believe that my team,
working on ML for olfaction, was partly a prestige play.</p>
<p>Prestige-oriented research still makes sense today as a hiring
tactic, but given that the industry is collectively cutting recruiting
budgets, prestige spending must also be reduced.</p>
<h2 id="mbas-and-golden-eggs">MBAs and Golden Eggs</h2>
<p>The next obvious reason for Google to invest in pure research is for
the breakthrough discoveries it has yielded and can continue to
yield.</p>
<p>As a rudimentary brag sheet, Brain gave Google TensorFlow, TPUs, <a
href="https://arxiv.org/abs/1609.08144">significantly improved
Translate</a>, <a href="https://jax.readthedocs.io/en/latest/">JAX</a>,
and <a href="https://arxiv.org/abs/1706.03762">Transformers</a>. These
are just the projects that were {pure research at inception} X {have
significant profit impact today}; if I loosen either constraint, the
list would be far longer, e.g. <a
href="https://jamanetwork.com/journals/jama/article-abstract/2588763">ML
for medical imaging</a>, and <a
href="https://ai.googleblog.com/2017/11/automl-for-large-scale-image.html">AutoML</a>,
to name a few.</p>
<p>Brain’s freewheeling, bottom-up, researcher-centric culture was
arguably what generated these breakthroughs. Jeff Dean is in charge of
research, precisely because he embodies these ideals. If instead an MBA
were in control, the MBA culture would trickle down, killing the goose
that lays the golden eggs. Better to hand over the golden eggs after
they’re laid and keep the MBAs in the shadows.</p>
<p>Over time, two trends have empowered MBAs to act more openly. The
first is the economic backdrop: with a tightening economy and with
increased competition from OpenAI/VC-funded AI startups, Google feels a
need to be more responsible and directed about its research investments.
The second is increased familiarity with ML’s capabilities. In the early
days of deep learning, nobody knew what it could be capable of, and the
researchers were given the privilege of chartering a research vision.
Today, thought leaders casually opine on how and where ML will be
useful, and MBAs feel like this is an acceptable substitute for expert
opinion. The result is reduced researcher freedom and more top-down
direction.</p>
<p>As an amusing anecdote, Google’s researcher promotion criteria were
for some time linked to external recognition of research significance.
If Google’s promo committees, formed of senior researchers, can’t even
decide whether their own research is significant, then what chance would
MBAs have? In the very near future, I would expect researcher promotion
criteria to shift towards delivered business value, rather than external
recognition of research impact.</p>
<p>Today, I see a similar wave of researcher empowerment with LLMs, as
once again, nobody but the researchers can credibly opine on their
capabilities. Even then, every LLM researcher can feel MBAs breathing on
their necks in a way that wasn’t the case during deep learning’s
ascendancy.</p>
<h2 id="the-51-attack">The 51% attack</h2>
<p>Another reason for Google’s funding of open-ended research is to
maintain its lead in machine learning.</p>
<p>Google had stayed ahead of the industry for well over a decade in
large-scale systems programming. Systems like <a
href="https://research.google/pubs/pub62/">MapReduce</a> (Hadoop), <a
href="https://research.google/pubs/pub39966/">Spanner</a> (CockroachDB),
and <a href="https://research.google/pubs/pub48190/">Zanzibar</a>
(AuthZed) solved problems that the industry was only beginning to
realize were problems, and it took 5-10 years for viable alternatives
(indicated in parentheses), copycatting the corresponding Google
whitepapers, to be available to competitors.</p>
<p>When Google open-sourced TensorFlow, it was clear that they had done
it again. This early success suggested that Google would be able to stay
ahead of its competitors by generously funding long-shot research
bets.</p>
<p>Unfortunately, this early lead would be completely squandered within
a few short years, with PyTorch/Nvidia GPUs easily overtaking
TensorFlow/Google TPUs. ML was, and frankly is, still too nascent to
have significant technical barriers to entry. The sustained eye-popping
funding for AI companies generated a surge in supply, with the number of
ML researchers growing ~25% YoY for the past decade. I taught myself
enough ML to blend in with the researchers at Brain over a relatively
short 2 years, and so have many others. Nobody, not even Google, can
afford to throw money into a bottomless pit.</p>
<p>Developing an early lead in a field (<em>cough</em> Transformers
<em>cough</em>) is also only valuable to the extent that Google can
translate that research edge into product. Brain’s <a
href="https://docs.google.com/presentation/d/1WrkeJ9-CjuotTXoa4ZZlB3UPBXpxe4B3FMs9R9tn34I/edit#slide=id.g164b1bac824_0_3835">recent
talent exodus</a> is in no small part due to internal perception that
Google was sitting on groundbreaking research rather than developing it
to its potential. ChatGPT raised serious existential questions for
Brain. If we take Google’s inability to execute on research translation
as a constant, then does it even make sense to invest internally in
open-ended speculative research? Google’s <a
href="https://www.ft.com/content/583ead66-467c-4bd5-84d0-ed5df7b5bf9c">$400M
investment in Anthropic AI</a> is a bad look: Google execs are hedging
their research bets on external research groups.</p>
<h2 id="catalyst-theory">Catalyst theory</h2>
<p>One unusual thing about Brain is its liberal publication policy - <a
href="https://medium.com/criteo-engineering/neurips-2020-comprehensive-analysis-of-authors-organizations-and-countries-a1b55a08132e">Brain
often outpublishes entire universities at top-tier ML conferences</a>.
Having invested vast sums into open-ended research, why give it away for
free? The major reasons for publication are 1) prestige 2) because
researchers can quit and take the knowledge with them anyway. A more
subtle reason is 3) to catalyze growth in a field.</p>
<p>The catalyst theory is that by publishing key research in areas
relevant to Google’s core business, that research direction will move in
a way that benefits Google. For example, Google has always been
interested in better NLP, and the publication of key research like <a
href="https://research.google/pubs/pub43155/">seq2seq in 2014</a> and <a
href="https://arxiv.org/abs/1706.03762">Transformers in 2017</a>
catalyzed the growth of the entire NLP field. Google is one of the few
companies with both the consumer surface area and the computational
might to scale up ML deployments to a billion users, so Google benefits
from the overall advancement of the field.</p>
<p>In peacetime mode, it makes sense to spend $X to grow the overall
pie, as long as your slice of the pie grows more than $X. In wartime
mode, it also matters how much your competitors’ slice of the pie is
growing. OpenAI’s alliance with Microsoft means that there is another
giant out there with both the consumer surface area and the
computational might to scale up ML deployments. As Google transitions to
<a href="https://a16z.com/2011/04/14/peacetime-ceo-wartime-ceo/">wartime
mode</a>, the catalyst theory is almost certainly dead at Google.</p>
<h2 id="retainer-fee">Retainer fee</h2>
<p>DARPA’s mission statement is “to prevent and create technological
surprise”. The best defense is a good offense, but it certainly doesn’t
hurt to have a stable of technical experts who can quickly understand
and respond to unexpected developments in the field. When times are
good, the experts can focus on original research, and when times are
bad, the experts will be drafted to work on defensive projects. Seems
reasonable, although the drawback of this plan is that there is no way
to guarantee that the experts will actually stick around when you cancel
their pet projects. To remove any doubt, you could also just lay them
off 🙂. Sarcasm aside, Google wasn’t wrong to lay me off; the fact that
I started writing this essay 5 months ago was a strong indicator to me
back then that I should be looking for new jobs.</p>
<p>Times are now bad. I expect to see Google call upon its researchers
to focus on LLMs, first with the carrot, and then the stick.</p>
<h2 id="tech-hubris">Tech Hubris</h2>
<p>Many of Brain’s open-ended research projects are quite
interdisciplinary in nature. As previously mentioned, my team worked on
<a
href="https://ai.googleblog.com/2022/09/digitizing-smell-using-molecular-maps.html">ML
for olfaction</a>, and Brain is also pioneering advances in ML for <a
href="https://jamanetwork.com/journals/jama/fullarticle/2588763">medical
imaging</a>, <a
href="https://ai.googleblog.com/2020/03/a-neural-weather-model-for-eight-hour.html">weather
modeling</a>, <a
href="https://ai.googleblog.com/2018/07/improving-connectomics-by-order-of.html">neuronal
imaging</a>, <a
href="https://ai.googleblog.com/2017/12/deepvariant-highly-accurate-genomes.html">DNA
variant calling</a>, <a href="https://magenta.tensorflow.org/">music and
art</a>, <a
href="https://ai.googleblog.com/2022/03/using-deep-learning-to-annotate-protein.html">protein
annotation</a>, and probably more that I’ve missed. There are many
success stories outside of Brain, like AlphaGo and AlphaFold.</p>
<p>There is no question that these interdisciplinary efforts have
yielded much fruit. However, two countervailing trends have reduced
Google’s willingness to continue funding these efforts.</p>
<p>The first is researcher demographics. There is nothing quite as
annoying as a <a href="https://xkcd.com/793/">physicist first
encountering a new field</a>. On the other hand, there is nothing quite
as transcendental as a domain expert learning physics (or in this case,
machine learning). Given the long timelines of a PhD program, the vast
majority of early ML researchers were self-taught crossovers from other
fields. This created the conditions for excellent interdisciplinary work
to happen. This transitional anomaly is unfortunately mistaken by most
people to be an inherent property of machine learning to upturn existing
fields. It is not.</p>
<p>Today, the vast majority of new ML researcher hires are freshly
minted PhDs, who have only ever studied problems from the ML point of
view. I’ve seen repeatedly that it’s much harder for a ML PhD to learn
chemistry than for a chemist to learn ML. (This may be survivorship
bias; the only chemists I encounter are those that have successfully
learned ML, whereas I see ML researchers attempt and fail to learn
chemistry all the time.) In any case, I expect the quality and success
rate of later interdisciplinary projects to drop correspondingly. Even
if Google execs don’t understand the nature of the trend, they will
notice the decreasing quality of the breakthroughs.</p>
<p>The second is that from a business perspective, it turns out it is
much easier for incumbents to learn machine learning than it is for
Google to learn a new business field. <a
href="https://www.healthcaredive.com/news/google-disbands-health-unit-as-chief-departs-for-cerner/605387/">Google
Health</a> is the most prominent example, but I have seen this pattern
play out repeatedly in other domains. I am skeptical that DeepMind’s
Isomorphic labs will get much further. On the other hand, companies like
Recursion Pharmaceuticals and Relay Therapeutics, staffed with a mix of
career biologists and chemists-turned-ML engineers, have done well. The
benefits of interdisciplinary ML breakthroughs seem to go to incumbents,
and do not form a strong basis for a new business line for Google.</p>
<h2 id="the-brain-deepmind-merger">The Brain-DeepMind Merger</h2>
<p>Where to begin? My thoughts on this are jumbled and in the interest
of a timely blog post, I will present them in bulleted list form…</p>
<ul>
<li>Google execs apparently thought the DeepMind branding was stronger
than Brain branding. Alternatively, Demis refused to sign off on the
merger unless the DeepMind name stayed.</li>
<li>This merger is probably a prelude to a greater restructuring.</li>
<li>Neither side “won” this merger. I think both Brain and DeepMind
lose. I expect to see many project cancellations, project mergers, and
reallocations of headcount over the next few months, as well as
attrition.</li>
<li>With fewer projects to go around, I expect to see a lot of middle
management get cut or leave.</li>
<li>I expect there to be a lot of turbulence due to DeepMind’s top-down
culture clashing with Brain’s bottom-up culture. The turbulence will
bring any merger efficiency gains down to, or even below zero.</li>
</ul>
<h2 id="the-road-ahead">The road ahead</h2>
<p>Despite Brain’s tremendous value creation from its early funding of
open-ended ML research, it is becoming increasingly apparent to Google
that it does not know how to capture that value. Google is of course not
obligated to fund open-ended research, but it will nevertheless be a sad
day for researchers and for the world if Google turns down its
investments.</p>
<p>Google is already a second-mover in many consumer and business
product offerings and it seems like that’s the way it will be in ML
research as well. I hope that Google at least does well at being second
place. There’s lots of room for winners in machine learning.</p>
A Research Codebase Manifesto2023-02-14T00:00:00Z2023-02-14T00:00:00Ztag:www.moderndescartes.com,2023-02-14:/essays/research_code
<p> Originally posted 2023-02-14</p>
<p> Tagged: <a href="/essays/tags/software_engineering">software_engineering</a>, <a href="/essays/tags/machine_learning">machine_learning</a>, <a href="/essays/tags/python">python</a>, <a href="/essays/tags/popular">popular</a></p>
<hr />
<p><em>Note: Multiple people have told me that this essay could equally
well have been titled “A Startup Codebase Manifesto”. YMMV.</em></p>
<p>At Google Brain, I was the tech lead of a team with multiple
researchers and engineers actively running experiments and committing
changes to a shared codebase. This codebase has generated feedback like
“you have no idea how much i miss our old codebase”, “this is a textbook
example of what a research codebase should look like”, and “I was
curious how company X’s research codebase would look and it’s a complete
mess compared to your codebase”. (For curious googlers: you can find
this codebase if you search for CLs submitted by brianklee@).</p>
<p>Managing a research codebase is difficult. I have heard of other
research teams that attempted to join their many research subprojects’
codebases, only to run into issues around code ossification, slower
iteration cycles, and general researcher frustration. Yet other research
teams, wary of these issues, embrace the academic baseline of untended
anarchy (yes, even at Google).</p>
<p>Here are some of the lessons I’ve learned in helping our team make
the best use of our codebase.</p>
<p>For some context, our team was roughly a 1:2 mix of engineers to
researchers, and we worked on machine learning applied to molecular
property prediction and representation learning. My advice is probably
more useful for industry research groups and less useful for academic
research groups. It will be difficult to bootstrap this type of codebase
discipline without an engineering champion in your group.</p>
<h2 id="codebase-evolution">Codebase evolution</h2>
<p>Writing a one-person research codebase is easy. The difficulty arises
when you try to maintain this codebase over multiple people and over
time. Software engineering best practices are designed to alleviate
these issues, but the usual recommendations don’t always work, because
research codebases change far faster than product codebases. The stakes
are higher, too - a stagnant product codebase can still generate
business value, but a stagnant research codebase simply fails at its
core purpose: to investigate and evaluate new ideas.</p>
<p>Here are some of the most common ways research teams respond to
evolving research interests.</p>
<ul>
<li>Change code without caring about compatibility. The result is spooky
breakage at a distance. A researcher can check in changes that progress
their research by 1x and retard everyone else’s research by 0.5x each,
for a net drag on productivity. If everybody has their own solo
codebase, then there are fewer costs to breakage, but also fewer
benefits to collaboration. (This is the academic default.)</li>
<li>Carefully update code, maintaining compatibility with project code.
As older projects accumulate, the backwards compatibility tax grows and
grows.</li>
<li>Don’t change code. Often groups end up in this category because
their research group turned into a product group, but regardless of the
reason, it spells the death of new research.</li>
<li>Start over from scratch, copying code snippets from the old codebase
as needed.</li>
</ul>
<p>Each strategy has its pros and cons. I found the following strategy
effective within my team.</p>
<h2 id="the-three-tier-codebase">The Three-Tier Codebase</h2>
<p>This strategy is a mix of approaches (2) and (4).</p>
<ul>
<li>Core. Libraries for reusable components like cloud data storage,
notebook tooling, neural network libraries, model
serialization/deserialization, statistics tests, visualization, testing
libraries, hyperparameter optimization frameworks, wrappers and
convenience functions built on top of third-party libraries. Engineers
typically work here.
<ul>
<li>Code is reviewed to engineering standards. Code is tested, covered
by continuous integration, and should never be broken. Very low
tolerance for tech debt.</li>
<li>Breaking changes to core code should be accompanied by fixes to
affected project code. The project owner should assist in identifying
potential breakage. No need to fix experimental code.</li>
</ul></li>
<li>Projects. A new top-level folder for each major effort (rough
criteria: a project represents 1-6 months of work). Engineers and
researchers work here.
<ul>
<li>Code is reviewed for correctness. Testing is recommended but
optional, as is continuous integration.</li>
<li>No cross-project dependencies. If you need code from a different
project, either go through the effort of polishing the code into core,
or clone the code.</li>
</ul></li>
<li>Experimental. Anything goes. Typically used by researchers. I
suggest namespacing by time (e.g. a new directory every month).
<ul>
<li>Rubber-stamp approvals. Code review is optional and comments may be
ignored without justification. Do not plug this into continuous
integration.</li>
<li>The goal of this directory is to create a safe space for researchers
so that they do not need to hide their work. By passively observing
research code “in the wild”, engineers can understand research pain
points.</li>
<li>Any research result that is shared outside the immediate research
group may not be derived from experimental code.</li>
</ul></li>
</ul>
<p>The key idea is that when project-specific code is not generating
research value, it is not worth upkeep and should be amputated. By
configuring project-specific code to be amputation-ready, the codebase
as a whole stays healthier. If this feels strange to you, remember that
your job in a research group isn’t to write code, it’s to do research,
and this remains true whether your job description says Engineer or
Researcher/Scientist.</p>
<p>This structure solves for some tricky dynamics, which I will explain
further.</p>
<h3 id="engineerresearcher-collaboration">Engineer/Researcher
collaboration</h3>
<p>Tensions arise when engineers and researchers interact in a single
codebase. Engineers have a shared understanding of software best
practices, e.g. testing code, reusable functions, single-responsibility
principle, etc.. Researchers, on the other hand, don’t see the benefits
of such best practices and resent the drag on their individual
productivity.</p>
<p>This tension most commonly manifests during code review. Engineers
tend to impose demands on researchers’ code before it can be checked in,
whereas researchers tend to rubber-stamp each others’ code, leaving
engineers to feel like they are permanently on clean-up duty.
Researchers, annoyed by the slowdown in code velocity, will evade the
code review mechanism by iterating in private on a solo code repository
or by working entirely in notebooks instead of proper modules.
Engineers’ tools go underutilized because codebases are not
integrated.</p>
<p>One of the strengths of the three-tier codebase is that it helps
engineers and researchers collaborate by setting code review
expectations. The benefits include healthier team dynamics, increased
probability of correctness, mutual learning opportunities, and overall a
happier team.</p>
<h3 id="keeping-track-of-code">Keeping track of code</h3>
<p>Another strength of the three-tier codebase is centralization of
code. Centralization creates a single source of truth, encourages core
code reuse, and streamlines workflows. It’s important enough that the
sole purpose of the Experimental directory is to discourage the creation
of private codebases. A Colab notebook on Google Drive or an unpushed
git branch on your laptop’s hard drive count as private codebases, in
this reckoning. Ultimately, a shared codebase is a foundation for shared
progress and learning.</p>
<p>In the absence of centralization, many inefficiencies arise. If you
haven’t struggled to recover the precise version of some notebook that
generated figure 4 in your paper, which Reviewer 2 is now critiquing,
then have you really done research? What about haggling with your IT
department’s privacy lawyers to try and salvage a python notebook from a
former intern’s returned laptop?</p>
<p>That being said, you shouldn’t bother checking in every snippet of
throwaway code. A good rule of thumb is that you should check in code
only if the result was interesting enough to share with your team. (I
mean result in a general sense: an explanation, knowledge, a specific
number, and <em>especially</em> a dataset.) If you wouldn’t pollute
their mindspace during group meetings, why would you pollute the
codebase?</p>
<h2 id="a-comment-on-notebooks">A comment on notebooks</h2>
<p>Some people hate notebooks because they are sometimes not much more
legible than a transcript of an interpreter session. They can even
introduce new and exciting failure modes, usually due to out-of-order
execution or hidden state due to overwritten/deleted cells. Yet, they’re
an indispensable part of the research toolkit.</p>
<p>Not all notebooks are worth checking in. As mentioned before, a good
cutoff criterion is whether the notebook generates a research result
that you thought interesting enough to share with your team. When you
check in a notebook, the following steps will minimize unnecessary
sadness for future readers and users of the notebook (including
yourself):</p>
<ul>
<li>delete nonessential cells</li>
<li>check in cell output (but do trim noisy/verbose output)</li>
<li>restart kernel and run your notebook from top to bottom to check for
out-of-order execution issues</li>
</ul>
<p>Despite my statement about experimental being “anything goes”, I do
think the above steps are easy enough that they should be insisted upon
even for experimental code.</p>
<h2 id="keeping-up-with-the-times">Keeping up with the times</h2>
<p>One final pathology endemic to research codebases is the build-or-buy
dilemma. By their very nature, research codebases are typically on the
cutting edge of what people are interested in building, and there are
rarely well-built libraries for the thing you are trying to accomplish.
So at first, build is really the only option. But unless you have a
large enough engineering budget (cough DeepMind cough) that you can
create your own ecosystem of well-polished first-party solutions, time
will eventually produce a third-party solution that does it better.</p>
<p>The three-tier codebase forces an explicit decision to polish and
promote project code into core. Good judgment is necessary to decide
whether to polish something into core or to procrastinate by just
copying old code into a new project directory. Neither decision is
necessarily wrong. My hit rate was roughly 60% of our core libraries
which are still the best available solution to their problems, which
seems decent. As a case study on one of the 40% I missed on, consider
our graph neural network library.</p>
<p>We built our own TF2 graph neural network (GNN) library in late 2019,
mere months after TF2’s release. It was customized for molecules, taking
advantage of carbon’s four-valence constraint to optimize the adjacency
list representation. I was the resident TF2 expert in the Cambridge
research office, so it seemed like a natural choice at the time. But if
we had to restart today I would probably go with <a
href="https://github.com/deepmind/jraph">JAX/Jraph</a>, publicly
released in late 2020.</p>
<p>We never made the jump to JAX/Jraph, because the cost-benefit never
seemed to be worth it. (JAX’s static shape requirements and
serialization weaknesses significantly increased the migration costs,
while our small molecule datasets limited the upside to better GNN
architectures.) While the existing GNN libraries worked well for what we
were doing, it impeded new research in subtle ways - hypergraph or
multi-molecule architectures were forever on our horizon because they
were difficult to implement. I overinvested in our GNN libraries and
they subsequently got interwoven into our workflows, making it difficult
to migrate away.</p>
<p>The “obvious” solution is to cut your losses early and migrate as
soon as a better library is identified, but that’s easier said than
done. It’s particularly impressive if you can identify and switch to a
library with a better trajectory, even before it reaches feature parity
with your existing libraries. DeepMind’s early 2020 decision to shift
their entire organization to JAX continues to impress me with its
foresight.</p>
<h2 id="parting-thoughts">Parting thoughts</h2>
<p>Academic research groups often complain about the inequality of
resources relative to industrial research groups. Compute resources
typically come to mind, but another important inequality is industry’s
ability and willingness to hire engineers alongside researchers. The
difference is structural: even though engineering wisdom is free and
readily available online to any grad student who wishes to obtain it, it
rarely happens because that’s not what gets you your PhD. And even then,
merely hiring an engineer is not enough to make your research group more
productive. Integrating the research and engineering worlds requires
researchers to understand when engineering is necessary, and engineers
to understand when it is not. It’s a culture shift that’s hard to pull
off.</p>