Better ways to name your utils module

Originally posted 2023-09-06

Tagged: software engineering

Obligatory disclaimer: all opinions are mine and not of my employer


As the joke goes, the two hardest problems in computer science are 1) cache invalidation and 2) naming. Like all good jokes, there’s a kernel of truth in there. Naming is poetry; the perfect name has both precise meaning but also precise connotation. Naming is also mathematics; the judicious choice of which concepts deserve a name can trivialize a problem. In software engineering terms, good naming is equivalent to good factoring of the problem domain into the right abstractions and APIs.

A utils.py module is a failure of naming. Let’s talk about ways we can improve the situation.

utils.py Considered Harmful

see also: helper.py, misc.py

First, a quick rundown: why exactly are utility modules considered harmful?

At a philosophical level: code should make its intent clear. Nobody would let a function name like do_stuff() pass code review. So why tolerate an equally ambiguous module name?

At a practical level: utility modules tend to accumulate dependencies, causing everything to depend on everything via the utils bottleneck. It’s a great breeding ground for inadvertent circular dependencies.

At a social level: the existence of one module named utils.py implicitly grants permission to create more, leading up to that doleful moment when you have to resolve a name collision between two or more utils modules.

Not Considered Harmful: foo_utils.py

While the ideal codebase should not have any utils.py in it, a pragmatic compromise is to categorize your utility code. foo_utils.py demonstrates intention: it is about foo, but more importantly, it is not about stuff that is not foo. utils/foo.py is also okay.

Sorting your utilities

I’ve seen many flavors of utility code which could be easily sorted into more appropriate categories. See if any of the following categories match your code:

$PLATFORM_utils - hacks, workarounds, and codified usage patterns for a platform’s deficiencies and inconveniences. Retry/backoff logic? Concurrency and consistency workarounds? Missing primitives? Auth? Environment management?

testing_utils - make the testing process easier (randomness, parametrization, fuzzing, customized assertions, etc.). Do not put test fixtures or mocks here! Those belong alongside the unit tests that consume them. There are lots of great libraries out there for making your tests better - mock, parametrized, hypothesis, to name a few.

$DOMAIN_SPECIFIC_CONCEPT - Your domain probably has some domain-specific concepts that are not obvious to outsiders. Middleware? Augmentation? Symmetries? If you’re relatively inexperienced in the problem domain, you may not realize these concepts exist, and reinvent them poorly in the utils module. Read other OSS codebases, papers, or books to learn what these concepts are. A special-shoutout goes to parsers and compilers, which people reinvent badly on a regular basis.

base - Foundational data types, definitions, and concepts that are used pervasively throughout the codebase. This will get imported everywhere, so keep a strict watch on its dependencies.

$SYSTEM_client: When data generated by one system is consumed by another, and their APIs don’t quite align, then some adaptor code is needed to munge the data formats. If both systems are under your control, you should figure out a better API. If one or both of those systems is from a third party provider, then you have no choice but to write adaptor code. As a company grows, it’s pragmatic for teams to start treating each other as third parties, depending on org chart distance.

visualizations: often used interactively and tends to invoke libraries with unique GUI or system dependencies that don’t work on CI or other headless deployments.

single-use code: Code that only has one caller. This often makes its way into utility modules in an attempt to hide ugly code. You should keep this code right next to its caller, or inline the code. Nobody is being fooled by the indirection.

Codebase maintainers: Do you have a utils problem?

Try measuring the percentage of code that lives in “utils” modules. You can accomplish this by running cloc on your codebase, and then running find . -name "utils.py" | cloc --list-file=- to get util-specific metrics.

Broken down by percent of code in utils modules:

  • 0-2%: Healthy.
  • 2-10%: Unhealthy. Share this essay with your team and discourage further additions to utils.py
  • 10+%: Morbid. Your codebase – or possibly your management – needs an intervention of some sort.

If you have a healthy amount of code in utils modules, congratulations! I’d suggest writing up the cloc commands as a script to monitor regression. Of course, beware Goodheart’s law, and don’t hold anyone to this metric!