Originally posted 2022-02-22

Tagged: computer_science

Obligatory disclaimer: all opinions are mine and not of my employer

Over the 2021 Christmas season, I worked through problems 1-15 of the Advent of Code in Haskell. For each problem, I tried to do it “correctly” in Haskell, rather than hacking my way through the problem. I learned a ton and had a lot of fun. You can see my solutions on GitHub.

I didn’t finish AoC, but the questions I completed covered a diverse array of programming concepts: iterators (day 1), fold/reduce (day 2, 3), state machines (day 4), multidimensional arrays (day 5, 9, 11, 12), math/logic (day 6, 7, 8, 13, 14), flood fill (day 9), stacks (day 10), BFS/DFS (day 12), string munging (day 14), Djikstra’s (day 15), parsers/binary protocols (day 16). I initially misdiagnosed day 15 as a memoization problem, so I also read the “PhD thesis” on memoization in Haskell before realizing I’d goofed. (You think I’m goofing about the PhD thesis… I’m not. More on this below…)

Day 16 was parsers so I ended up exploring the rabbit hole of parsers, for which Haskell is the undisputed #1 language. I didn’t find my way out of that rabbit hole and ran out of vacation days.

## Tools and Workflow

The Glasgow Haskell Compiler (GHC) is the closest thing I have seen to the mythical “sufficiently smart compiler”. The way this works is that Haskell forces you to write in a mathematically pure way that the compiler can use to make mathematically guaranteed transformations. No undefined behavior here. Most people don’t think in math, which is probably why Haskell will never be mainstream. Having said that, your experience with Haskell may surprise you even if you don’t think of yourself as a math person. There are many things to learn here, not just ivory tower programming language theory.

The tools I used for this exercise were VSCode, HLS, hlint, and stack.

### The good: hlint

The easiest place to see Haskell’s compiler magic is in the hlint plugin, which actually exposes some of these compiler passes to you.

For example, (map f3) . (map f2) . (map f1) list takes a list of values and makes three passes over the list, applying a function each time. Haskell is smart enough to rewrite this code as the single pass version: map (f3 . f2 . f1) list. hlint exposes this rewrite rule, letting you know that you can simplify your code.

I must confess that I abused hlint by throwing really bad, hacky code at it. Hlint obliged me by telling me exactly how to simplify the code using idiomatic Haskell. In one instance, I hacked together a gnarly if-else tree, which hlint converted to the guard clauses whose syntax I hadn’t yet mastered.

-- before
checkNotVisited path node =
if all isUpper node then True
else if node == "start" then False
else if node == "end" then True
else if hasVisitedSmallTwice path
then countOccurrences node path < 2
else countOccurrences node path <= 2

-- after hlint-suggested rewrite
checkNotVisited path node
| all isUpper node = True
| node == "start" = False
| node == "end" = True
| hasVisitedSmallTwice path = countOccurrences node path < 2
| otherwise = countOccurrences node path <= 2

hlint is far more than just a linter; it is itself a compiler, but it generates Haskell code rather than machine code.

What’s the name of the function that returns the first element of a 2-tuple? I don’t know, but Hoogle does. Searching for the type signature (a, b) -> a tells me it’s fst. Hoogle also lets you look up the line noise like <$> or >>= you encounter. It’s a pretty indispensable tool. ### The meh: printf debugging I encountered the cryptic error Prelude.!!: index too large one day. (!! is the indexing operator for lists). I wondered why Haskell didn’t just tell me what the list and the attempted index was, as is good practice in writing error messages. It turns out that Haskell can’t actually tell me because of tail recursion optimizations. Specifically, if I were trying to index the 17th element of a length-10 list, Haskell will tail-recurse down to “get the 7th element of an empty list” before throwing the “index too large” error - so it no longer has the information needed to give a proper error message! Similarly, enabling stack traces was near impossible, ironically so for a build tool named “stack”… It turns out that Haskellers mostly rely on traceShowId, a function that implements the identity fn, with the side effect of printing the value to stdout. Before I learned about traceShowId, I raged at having to convert every function in my code into the IO monad to enable printing to stdout (the “what color is your function” problem). traceShowId uses unsafePerformIO to sneak IO side-effects past the Haskell compiler and to create better user ergonomics, not unlike Rust’s unsafe. I didn’t play with GHCi’s interactive debugger, which was a missed opportunity. ## Functional programming, distilled I’m quite familiar with FP ideas, mostly in the context of Python’s functools module, to the point where I often knew what I wanted, but I just didn’t know what it was called. (Hoogle to the rescue here!) But Haskell is functional programming distilled. One example: In Haskell, function application is itself a function named $. The only way I can explain this idea in Python is by proposing the operator.call method, such that operator.call(f, *args, **kwargs) == getattr(f, '__call__')(*args, **kwargs). In Python, this concept is so obscure that it isn’t even in the stdlib; in Haskell, this concept gets the privilege of a single-character sigil!

What does this buy you? For one, it enables a beautiful symmetry: you can map one function over many values, or map many functions over one value!

>>> map (1+) [0, 1, 2, 3]
[1, 2, 3, 4]

## Cheating with the Runtime

By far my most frustrating experience with Haskell was the seemingly trivial task of memoizing a function. You see, the idea that one might mutate a cache with the result of a function crosses a big red line in Haskell, and requires the invocation of the State Transformer Monad if you wanted to just do the obvious thing. Every function that calls this memoized function would itself invoke the STMonad since calling those functions could potentially trigger state mutation. (Callback to the “what color is your function” problem…). Of course, memoization is ultimately a monad, so this is the correct way to do it, albeit annoying.

The Haskeller’s solution to this annoyance is a mind-bending abuse of Haskell’s runtime, that reminded me of the infamous sleepsort. You see, Haskell allows the concept of a lazily evaluated infinite data structure. This is implemented by storing a thunk, which is a pending computation. If you want the next element, execute that thunk to get your next item, and possibly another thunk. The trick here is that Haskell’s thunk management runtime is exactly what we want in a memoization utility: the ability to mutate state (here, the thunk runtime’s state) upon the first invocation of a function without futzing around with STMonad.

The classic demonstration of this hack is the Fibonacci numbers as an infinite list, where evaluating each element of the list returns not only the Fibonacci number, but the continuation thunk that continues the pattern through a circular reference.

fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

Now, this is a bit of a problem on various fronts. First, if you want fibs(n), you need to evaluate all the values from 0...n. This is okay and necessary for Fibonacci numbers, but in general there is no reason one should have to evaluate n function calls when you just wanted to evaluate and cache the 1 function call. You can imagine yet another wrapper to indicate thunks that have been instantiated but not yet executed, but accessing the cached value would still require O(n) pointer traversals. The fix to this problem is to convert your infinite list into an infinite binary trie over integers, which reduces the pointer traversal to O(log n). This technique is widely known to early Lisp programmers. As for how to cache things other than one-argument integer functions, all you need is a bijection between arbitrary objects and integers, which isn’t that hard to pull off (just use the bytes representation of a string, for example).

These ideas are concretized in the Data.MemoCombinators library, among others. It repeatedly blows my mind how concise these implementations tend to be. The heavy lifting is done by literally two lines of code, which say, “Take any function whose arguments can be turned into bits, and create a version of this function backed by an infinite tree whose nodes are the integers, pointing to the unevaluated thunk for f”.

-- | Memoize an ordered type with a bits instance.
bits :: (Num a, Ord a, Bits a) => Memo a
bits f = IntTrie.apply (fmap f IntTrie.identity)

and the rest of the module is just defining the bijection operator and bijections between bits and commonly used types, like integers/tuples/Maybe/strings/bool/Enum.

Still, we haven’t even begun thinking about fixed-size caches and eviction policies, which are kind of important for any real system… To be clear, you can implement hash tables in Haskell as they are implemented in most mainstream languages, it’s just that mutating them while maintaining functional purity creates a difficult choice: either wrap every hash table access in unsafePerformIO to bypass all of Haskell’s language guarantees, or copy the entire hash table on each mutation, defeating the purpose of a hash table. On the other hand, trees let you reuse subtrees, enabling updates in O(log n) while retaining functional purity.

## Takeaways

I’d superficially known that there were two parallel universes of computation, one based on the array-of-bytes reality that is our Von Neumann architecture, and another based on the purely mathematical lambda calculus. Nearly everybody in software lives in Von Neumann land and the programming languages we use reflect this bias.

Learning Haskell let me understand the sophistication and potential of lambda calculus. And yet, lambda calculus feels to me like an island resort, somewhere that’s fun to visit, but would never be your primary residence.

Personally, I like to understand how my tools work, and to me, the perfect language/API/DSL is one that transparently organizes the underlying reality into simple yet accurate models. C exposes the underlying reality but fails to impose any reduction in complexity. Haskell creates simple models that don’t reflect the underlying reality, and it is only through the magical sufficiently smart compiler that is GHC that Haskell can run performantly on Von Neumann architectures. I feel most at home in the APL world, which melds lambda calculus with Von Neumann architectures. You probably know APL through its descendants in the NumPy family, although most don’t realize this history. I’ve been told by several people that NumPy is a pretty botched implementation of APL ideas, so my next side project may be digging into the original APL.

Many thanks to Alexey Radul for putting up with all of my Haskell questions and trying to explain to me why memoization in Haskell isn’t as silly as I think it is (he was unsuccessful). Thanks as well to Vaibhav Sagar for helping me understand the basics of Haskell’s toolchain and build ecosystem.