Hacker News new | ask | show | jobs
by ezyang 5529 days ago
In my opinion, this code comments itself, even more so than a manually, written-out parser would.

Here's how I would think about it. What would a grammar for Lisp look like?

    value := list | number | symbol
    list := '(' value + ')'
    number := [0-9]+
    symbol := [A-Za-z-]+
OK, great. Now how do I look at this code?

    value :: Parser Value
    value =
          List <$> (char8 '(' *> sepBy value (takeWhile1 isSpace_w8) <* char8 ')')
      <|> Number . fst . fromJust . B.readInteger <$> takeWhile1 isDigit_w8
      <|> Symbol <$> takeWhile1 (inClass "A-Za-z\\-")
Even if I don't understand the combinators, I can blank them out for now and focus on the recognizable semantic bits. I see List, Number and Symbol, ok, so my guess is that <|> does something like a | might in my grammar, and each line corresponds to a different way a value can be expressed. For list, I see some quoted '(' and ')', so I guess that 'char8' means "match this literal." In fact, I can just read all of that off, and it makes sense. Nevermind what > and < are doing, I'll ignore that for now. And so forth.

Suppose I wanted to make a superficial change, like make $ a recognized symbol in the language. That's super easy. I don't even need to look at the docs. If I want to introduce a new syntactic construct? A little harder; I'll have to go check the attoparsec docs. But you'd have to look up the docs for a library in any language, anyway.

All's not well; in particular, this code conflates the tokenizing and parsing steps (notice that I don't say anything about whitespace in my grammar, but there's some line-noise here dealing with it.) That decreases the readability a little, but you gain efficiency with it.

Maybe there's a tradeoff: the use of a library and symbolic combinators makes it harder to tell precisely how the code manages to actually do any parsing. That's true of any abstraction. But what's really great about this is that I can easily tell what the big picture of the code is. A page or two of state machine would not do that for me!