Better ways to name your utils module
Originally posted 2023-09-06
Obligatory disclaimer: all opinions are mine and not of my employer
As the joke goes, the two hardest problems in computer science are 1) cache invalidation and 2) naming. Like all good jokes, there’s a kernel of truth in there. Naming is poetry; the perfect name has both precise meaning but also precise connotation. Naming is also mathematics; the judicious choice of which concepts deserve a name can trivialize a problem. In software engineering terms, good naming is equivalent to good factoring of the problem domain into the right abstractions and APIs.
A utils.py module is a failure of naming. Let’s talk about ways we can improve the situation.
First, a quick rundown: why exactly are utility modules considered harmful?
At a philosophical level: code should make its intent clear. Nobody
would let a function name like
do_stuff() pass code review.
So why tolerate an equally ambiguous module name?
At a practical level: utility modules tend to accumulate dependencies, causing everything to depend on everything via the utils bottleneck. It’s a great breeding ground for inadvertent circular dependencies.
At a social level: the existence of one module named
utils.py implicitly grants permission to create more,
leading up to that doleful moment when you have to resolve a name
collision between two or more
Not Considered Harmful: foo_utils.py
While the ideal codebase should not have any
it, a pragmatic compromise is to categorize your utility code.
foo_utils.py demonstrates intention: it is about
foo, but more importantly, it is not about stuff that is
utils/foo.py is also okay.
Sorting your utilities
I’ve seen many flavors of utility code which could be easily sorted into more appropriate categories. See if any of the following categories match your code:
$PLATFORM_utils - hacks, workarounds, and codified usage
patterns for a platform’s deficiencies and inconveniences. Retry/backoff
logic? Concurrency and consistency workarounds? Missing primitives?
Auth? Environment management?
testing_utils - make the testing process easier
(randomness, parametrization, fuzzing, customized assertions, etc.). Do
not put test fixtures or mocks here! Those belong alongside the unit
tests that consume them. There are lots of great libraries out there for
making your tests better -
hypothesis, to name a few.
$DOMAIN_SPECIFIC_CONCEPT - Your domain probably has some
domain-specific concepts that are not obvious to outsiders. Middleware?
Augmentation? Symmetries? If you’re relatively inexperienced in the
problem domain, you may not realize these concepts exist, and reinvent
them poorly in the utils module. Read other OSS codebases, papers, or
books to learn what these concepts are. A special-shoutout goes to
parsers and compilers, which people reinvent badly on a regular
base - Foundational data types, definitions, and
concepts that are used pervasively throughout the codebase. This will
get imported everywhere, so keep a strict watch on its dependencies.
$SYSTEM_client: When data generated by one system is
consumed by another, and their APIs don’t quite align, then some adaptor
code is needed to munge the data formats. If both systems are under your
control, you should figure out a better API. If one or both of those
systems is from a third party provider, then you have no choice but to
write adaptor code. As a company grows, it’s pragmatic for teams to
start treating each other as third parties, depending on org chart
visualizations: often used interactively and tends to
invoke libraries with unique GUI or system dependencies that don’t work
on CI or other headless deployments.
single-use code: Code that only has one caller. This often makes its way into utility modules in an attempt to hide ugly code. You should keep this code right next to its caller, or inline the code. Nobody is being fooled by the indirection.
Codebase maintainers: Do you have a utils problem?
Try measuring the percentage of code that lives in “utils” modules.
You can accomplish this by running
cloc on your codebase,
and then running
find . -name "utils.py" | cloc --list-file=- to get
Broken down by percent of code in utils modules:
- 0-2%: Healthy.
- 2-10%: Unhealthy. Share this essay with your team and discourage further additions to utils.py
- 10+%: Morbid. Your codebase – or possibly your management – needs an intervention of some sort.
If you have a healthy amount of code in utils modules,
congratulations! I’d suggest writing up the
as a script to monitor regression. Of course, beware Goodheart’s law,
and don’t hold anyone to this metric!