Simplifying Fluffy Constructors in Unit Tests

2023-09-23

The archetypal unit test looks like this:

arg1 = ...
arg2 = ...
expected_output = ...
actual_output = function_to_test(arg1, arg2)
assertEqual(expected_output, actual_output)

A very common problem is that, over time, objects accumulate fields and subobjects, until it takes significant effort just to construct an object. Constructing arg1, arg2, and expected_output can take hundreds of lines, while the function call and the assertion are just two lines. These tests are like cotton candy: a tremendous amount of fluff with a tiny core. Well, at least cotton candy is tasty. This fluff is tedious to write, tedious to review, and tedious to scroll through, which leads to less unit testing than is optimal. It’s like chatting with that overly friendly downstairs neighbor who takes thirty minutes to tell you that the condo insurance is up for renewal.

The most common coping mechanism for fluffy constructors is the singleton: one example object that feeds into every test. Often, this singleton ends up in the setUp() method shared by all tests. The many fields of the shared singleton are pinned by various different unit tests’ assertions, and gradually it becomes impossible to either customize the object, or to add new unit tests. When the test class reaches this point, the process starts all over with a new freshly made singleton object and a new test class. This seems a little bit silly. But how can we do better?

Factory methods hide fluff

The first step towards simplifying fluffy tests is to decide which details are relevant.

Take this test:

car1 = Vehicle(
    mass_kg=2000,
    location=Location(x_m=0, y_m=0),
    velocity=Velocity(x_m_s=4, y_m_s=3),
    heading=math.atan2(3, 4),
    width_m=1.8,
    length_m=4.0,
    emergency_vehicle=False,
    )
car2 = Vehicle(
    mass_kg=2000,
    location=Location(x_m=4, y_m=-2),
    velocity=Velocity(x_m_s=0, y_m_s=5),
    heading=math.atan2(5, 0),
    width_m=1.8,
    length_m=4.0,
    emergency_vehicle=False,
    )
self.assert(car1.speed_m_s) = 5
self.assert(car2.speed_m_s) = 5
self.assertTrue(willColideWithin5sec(car1, car2))

Many of these fields are irrelevant, so we may as well hide them behind a factory method that sets sensible defaults.

car1 = make_suv(
    location=Location(x_m=0, y_m=0),
    velocity=Velocity(x_m_s=4, y_m_s=3),
    )
car2 = make_suv(
    location=Location(x_m=4, y_m=-2),
    velocity=Velocity(x_m_s=0, y_m_s=5),
    )
self.assert(car1.speed_m_s) = 5
self.assert(car2.speed_m_s) = 5
self.assertTrue(willColideWithin5sec(car1, car2))

You might object that factory methods just hide the fluff. It’s true that if you only have one unit test, this new solution is the same number of lines of code. But as the marginal cost of testing drops, you’ll get more tests. It’s also easier to manually verify that the unit test is correct.

DSLs hide syntactic fluff

In certain cases, the fluff is due to language syntax itself! You might think it isn’t possible to eliminate this type of fluff, but writing your own DSL is a powerful technique to do just that.

Which would you rather see?

go_board = np.array([
    [go.KO,1,1,0,0,0,0,0,0],
    [1,-1,0,0,0,0,0,0,0],
    [-1,0,-1,0,0,0,0,0,0],
    [0,0,0,0,0,0,0,0,0]
    [0,0,0,0,0,0,0,0,0]
    [0,0,0,0,0,0,0,0,0]
    [0,0,0,0,0,0,0,0,0]
    [0,0,0,0,0,0,0,0,0]
    [0,0,0,0,0,0,0,0,0]
    [0,0,0,0,0,0,0,0,0]
])

go_board = parse_board('''
    *XX......
    XO.......
    O.O......
    .........
    .........
    .........
    .........
    .........
    .........
''')

The latter contains far less visual noise, with half the total characters. It features a sensible null character, flexible whitespace for convenient embedding of inline data, and monospaced content. Definitely easier to read and write.

In many cases, you can reuse existing DSLs instead of having to create your own. This lets you even skip writing the parser - you can use a library for that. Instead of manually constructing a Pandas dataframe, why not just embed and parse a .csv? Instead of manually constructing a giant config object, why not parse YAML? Instead of manually constructing nodes and an adjacency graph, why not parse DOT? Possibly the most obscure DSL I’ve ever written is for organic chemistry reaction workups!

Conclusion

To defluff is to be human. We describe weather as sunny, cloudy, or rainy without having to specify temperature, humidity, cloud cover, or wind conditions. If a concept has been around for more than a decade, chances are, a very compact DSL already exists for it, and you won’t have to invent a new one.

Fluffy unit tests are annoying to read and write, but more than that, they discourage writing more unit tests. By investing in methods to defluff object constructors, it becomes a lot easier to write comprehensive unit test suites, and the unit tests become far easier to manually verify. As a bonus, factory methods and DSLs often end up quite useful outside of writing test cases, too - they make it easier to write tutorial notebooks or to construct ad-hoc objects during a debugging session.