2021 Advent of Code in Haskell
Originally posted 2022-02-22
Tagged: computer science
Obligatory disclaimer: all opinions are mine and not of my employer
Over the 2021 Christmas season, I worked through problems 1-15 of the Advent of Code in Haskell. For each problem, I tried to do it “correctly” in Haskell, rather than hacking my way through the problem. I learned a ton and had a lot of fun. You can see my solutions on GitHub.
I didn’t finish AoC, but the questions I completed covered a diverse array of programming concepts: iterators (day 1), fold/reduce (day 2, 3), state machines (day 4), multidimensional arrays (day 5, 9, 11, 12), math/logic (day 6, 7, 8, 13, 14), flood fill (day 9), stacks (day 10), BFS/DFS (day 12), string munging (day 14), Djikstra’s (day 15), parsers/binary protocols (day 16). I initially misdiagnosed day 15 as a memoization problem, so I also read the “PhD thesis” on memoization in Haskell before realizing I’d goofed. (You think I’m goofing about the PhD thesis… I’m not. More on this below…)
Day 16 was parsers so I ended up exploring the rabbit hole of parsers, for which Haskell is the undisputed #1 language. I didn’t find my way out of that rabbit hole and ran out of vacation days.
Tools and Workflow
The Glasgow Haskell Compiler (GHC) is the closest thing I have seen to the mythical “sufficiently smart compiler”. The way this works is that Haskell forces you to write in a mathematically pure way that the compiler can use to make mathematically guaranteed transformations. No undefined behavior here. Most people don’t think in math, which is probably why Haskell will never be mainstream. Having said that, your experience with Haskell may surprise you even if you don’t think of yourself as a math person. There are many things to learn here, not just ivory tower programming language theory.
The tools I used for this exercise were VSCode, HLS, hlint, and stack.
The good: hlint
The easiest place to see Haskell’s compiler magic is in the hlint plugin, which actually exposes some of these compiler passes to you.
For example, (map f3) . (map f2) . (map f1) list
takes a
list of values and makes three passes over the list, applying a function
each time. Haskell is smart enough to rewrite this code as the single
pass version: map (f3 . f2 . f1) list
. hlint exposes this
rewrite rule, letting you know that you can simplify your code.
I must confess that I abused hlint by throwing really bad, hacky code at it. Hlint obliged me by telling me exactly how to simplify the code using idiomatic Haskell. In one instance, I hacked together a gnarly if-else tree, which hlint converted to the guard clauses whose syntax I hadn’t yet mastered.
-- before
=
checkNotVisited path node if all isUpper node then True
else if node == "start" then False
else if node == "end" then True
else if hasVisitedSmallTwice path
then countOccurrences node path < 2
else countOccurrences node path <= 2
-- after hlint-suggested rewrite
checkNotVisited path node| all isUpper node = True
| node == "start" = False
| node == "end" = True
| hasVisitedSmallTwice path = countOccurrences node path < 2
| otherwise = countOccurrences node path <= 2
hlint is far more than just a linter; it is itself a compiler, but it generates Haskell code rather than machine code.
More good: Hoogle
What’s the name of the function that returns the first element of a
2-tuple? I don’t know, but Hoogle does. Searching
for the type signature (a, b) -> a
tells me it’s
fst
. Hoogle also lets you look up the line
noise like <$>
or >>=
you
encounter. It’s a pretty indispensable tool.
The meh: printf debugging
I encountered the cryptic error
Prelude.!!: index too large
one day. (!!
is
the indexing operator for lists). I wondered why Haskell didn’t just
tell me what the list and the attempted index was, as is good practice
in writing error messages. It turns out that Haskell can’t actually tell
me because of tail recursion optimizations. Specifically, if I were
trying to index the 17th element of a length-10 list, Haskell will
tail-recurse down to “get the 7th element of an empty list” before
throwing the “index too large” error - so it no longer has the
information needed to give a proper error message!
Similarly, enabling stack traces was near impossible, ironically so
for a build tool named “stack”… It turns out that Haskellers mostly rely
on traceShowId
, a function that implements the identity fn,
with the side effect of printing the value to stdout. Before I learned
about traceShowId, I raged at having to convert every function in my
code into the IO monad to enable printing to stdout (the “what
color is your function” problem). traceShowId uses
unsafePerformIO
to sneak IO side-effects past the Haskell
compiler and to create better user ergonomics, not unlike Rust’s
unsafe
.
I didn’t play with GHCi’s interactive debugger, which was a missed opportunity.
Functional programming, distilled
I’m quite familiar with FP ideas, mostly in the context of Python’s functools module, to the point where I often knew what I wanted, but I just didn’t know what it was called. (Hoogle to the rescue here!) But Haskell is functional programming distilled.
One example: In Haskell, function application is itself a function
named $
. The only way I can explain this idea in Python is
by proposing the operator.call
method, such that
operator.call(f, *args, **kwargs) == getattr(f, '__call__')(*args, **kwargs)
.
In Python, this concept is so obscure that it isn’t even in the stdlib;
in Haskell, this concept gets the privilege of a single-character
sigil!
What does this buy you? For one, it enables a beautiful symmetry: you can map one function over many values, or map many functions over one value!
>>> map (1+) [0, 1, 2, 3]
[1, 2, 3, 4]
>>> map ($ 3) [(1+), (2*), (2 **)]
[4.0, 6.0, 8.0]
Why is this useful? A concrete example: when converting a single-task neural network library to support multitask training, we needed to upgrade a LossFunction to a List[LossFunction], propagating this change throughout the codebase. In Haskell, this refactor would have been near trivial. But in Python, we ended up crawling through the codebase, rewriting a variety of callsites. We can of course implement the idea of mapping a list of functions in Python; it would be just.. not as nice as its Haskell equivalent.
In Haskell, there are many powerful higher order functions that simply can’t exist in other languages due to nonexistent language support. For example…
Monads
this section formerly contained a monad tutorial (You’re welcome.)
I cut out most of the Monad tutorial, but I’ll mention one cool thing
that Monads get you: the ability to trivially convert a function’s
“color”. <$>
is like the normal function application
$
, but it converts the caller function’s color. So, if you
have a Promise[Int], you can convert it to a Promise[String] with
exactly the same function you would use to convert an int to a string.
Just use <$>
instead of $
. (Even more
mindblowing: <$>
is polymorphic across monads, so it
works on any kind of monad, including user-defined ones.)
Cheating with the Runtime
By far my most frustrating experience with Haskell was the seemingly trivial task of memoizing a function. You see, the idea that one might mutate a cache with the result of a function crosses a big red line in Haskell, and requires the invocation of the State Transformer Monad if you wanted to just do the obvious thing. Every function that calls this memoized function would itself invoke the STMonad since calling those functions could potentially trigger state mutation. (Callback to the “what color is your function” problem…). Of course, memoization is ultimately a monad, so this is the correct way to do it, albeit annoying.
The Haskeller’s solution to this annoyance is a mind-bending abuse of Haskell’s runtime, that reminded me of the infamous sleepsort. You see, Haskell allows the concept of a lazily evaluated infinite data structure. This is implemented by storing a thunk, which is a pending computation. If you want the next element, execute that thunk to get your next item, and possibly another thunk. The trick here is that Haskell’s thunk management runtime is exactly what we want in a memoization utility: the ability to mutate state (here, the thunk runtime’s state) upon the first invocation of a function without futzing around with STMonad.
The classic demonstration of this hack is the Fibonacci numbers as an infinite list, where evaluating each element of the list returns not only the Fibonacci number, but the continuation thunk that continues the pattern through a circular reference.
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
Now, this is a bit of a problem on various fronts. First, if you want
fibs(n)
, you need to evaluate all the values from
0...n
. This is okay and necessary for Fibonacci numbers,
but in general there is no reason one should have to evaluate
n
function calls when you just wanted to evaluate and cache
the 1 function call. You can imagine yet another wrapper to indicate
thunks that have been instantiated but not yet executed, but accessing
the cached value would still require O(n)
pointer
traversals. The fix to this problem is to convert your infinite list
into an infinite binary trie over integers, which reduces the pointer
traversal to O(log n)
. This technique is widely known to
early Lisp programmers. As for how to cache things other than
one-argument integer functions, all you need is a bijection between
arbitrary objects and integers, which isn’t that hard to pull off (just
use the bytes representation of a string, for example).
These ideas are concretized in the Data.MemoCombinators
library, among others. It repeatedly blows my mind how concise these
implementations tend to be. The heavy lifting is done by literally two
lines of code, which say, “Take any function whose arguments can be
turned into bits, and create a version of this function backed by an
infinite tree whose nodes are the integers, pointing to the unevaluated
thunk for f
”.
-- | Memoize an ordered type with a bits instance.
bits :: (Num a, Ord a, Bits a) => Memo a
bits f = IntTrie.apply (fmap f IntTrie.identity)
and the rest of the module is just defining the bijection operator and bijections between bits and commonly used types, like integers/tuples/Maybe/strings/bool/Enum.
Still, we haven’t even begun thinking about fixed-size caches and
eviction policies, which are kind of important for any real system… To
be clear, you can implement hash tables in Haskell as they are
implemented in most mainstream languages, it’s just that mutating them
while maintaining functional purity creates a difficult choice: either
wrap every hash table access in unsafePerformIO
to bypass
all of Haskell’s language guarantees, or copy the entire hash table on
each mutation, defeating the purpose of a hash table. On the other hand,
trees let you reuse subtrees, enabling updates in O(log n)
while retaining functional purity.
Takeaways
I’d superficially known that there were two parallel universes of computation, one based on the array-of-bytes reality that is our Von Neumann architecture, and another based on the purely mathematical lambda calculus. Nearly everybody in software lives in Von Neumann land and the programming languages we use reflect this bias.
Learning Haskell let me understand the sophistication and potential of lambda calculus. And yet, lambda calculus feels to me like an island resort, somewhere that’s fun to visit, but would never be your primary residence.
Personally, I like to understand how my tools work, and to me, the perfect language/API/DSL is one that transparently organizes the underlying reality into simple yet accurate models. C exposes the underlying reality but fails to impose any reduction in complexity. Haskell creates simple models that don’t reflect the underlying reality, and it is only through the magical sufficiently smart compiler that is GHC that Haskell can run performantly on Von Neumann architectures. I feel most at home in the APL world, which melds lambda calculus with Von Neumann architectures. You probably know APL through its descendants in the NumPy family, although most don’t realize this history. I’ve been told by several people that NumPy is a pretty botched implementation of APL ideas, so my next side project may be digging into the original APL.
Many thanks to Alexey Radul for putting up with all of my Haskell questions and trying to explain to me why memoization in Haskell isn’t as silly as I think it is (he was unsuccessful). Thanks as well to Vaibhav Sagar for helping me understand the basics of Haskell’s toolchain and build ecosystem.