Hacker News new | ask | show | jobs
by snprbob86 5529 days ago
I'm familiar with parser combinators, but not the Attoparsec or other libraries. Furthermore, I have limited Haskel experience, so I'm struggling to read this. In particular, the use of symbols and overall terseness are hard to get through. I'm also really uncomfortable with the order of operations of all these operators.

Now, I know that Hoogle exists, so I'm able to look these things up, but it's pretty tough to untangle. Particularly when searching for

  *>
I found something that reads "This module describes a structure intermediate between a functor and a monad: it provides pure expressions and sequencing, but no binding. (Technically, a strong lax monoidal functor.)" Which is so full of jargon, my head hurts. And I even know what most of that jargon means!

So as far as I can tell, here's what's going on....

  Foo <$>
Defines a grammar production using the left argument as a constructor on the matched value of the right argument.

  <|>
Separates productions as alternatives; in a PEG Grammer style. Both the <$> and <|> operators combine functions to result in a big master function that takes input and has the result type that is the big `one of A, B, C` algebraic type that is defined earlier: "Value".

  char8
Matches a literal character (passed as an argument).

  char8 '(' *>
Reads a parenthesis and throws it out of the result set. Similar for the close parenthesis later on.

  takeWhile1 isSpace_w8
Produces a parser that consumes whitespace.

  sepBy value (takeWhile1 isSpace_w8)
Combines to produce a parser which reads a list of values (recursive here) and presumably eliminates the separators from the result.

  takeWhile1 isDigit_w8
Similar story, reads an integer.

  Number . fst . fromJust . B.readInteger
Is a composed constructor that may or may not parse an integer, forcibly assumes it will succeed to parse one (which we know because we're reading digits). That integer parse stops when it hits a non-digit, and the unconsumed characters are returned as the second element of a tuple, so we call fst to take the first element of the tuple, discarding the unread characters (the main parser will consume those).

  takeWhile1 (inClass "A-Za-z\\-")
Similar story again to the whitespace and numbers.

OK, yeah, so.... amazing. Incredibly brilliant little bit of code. But holy hell was that tough to read. And I have no idea if I'd be able to write it in only a few hours. I'm far from certain I'd ever learn to write it as fast or faster than something more verbose.

Please let me know if I've misunderstood something....

3 comments

"I'm familiar with parser combinators, but not the Attoparsec or other libraries. Furthermore, I have limited Haskel experience, so I'm struggling to read this. In particular, the use of symbols and overall terseness are hard to get through. I'm also really uncomfortable with the order of operations of all these operators."

This is why I've never identified with "code should comment itself".

I'm not afraid to admit it: I am a dumb programmer. As a dumb programmer, the only way for me to be a good programmer is to be effective.

To be effective, I need to minimize the "time spent understanding code" phase of software development. Furthermore, I will likely be working within a team.

The opposite of that is to be writing "prototype" code --- code which is written to explore the problem space and to gain a better understanding of specific patterns to best accomplish a goal.

But once the prototype phase is over, all of that code is deleted, and is replaced with commented code. (If possible, I try to explain each step of the algorithm in English, first, and then write the code. This is slower, but results in far fewer bugs and a more solid foundation.)

I look at it this way: when my life ends, either I will have gone as far as my own brain was able to take me --- or I will have gone as far as my team's brains were able to take us. I'm willing bet my life that teams of "less brilliant" people are more effective than individuals of excessive brilliance. I write my code accordingly.

-------------------

That said, there is no excuse for laziness. Even if I am merely "competent", it's important for me to strive to be brilliant, even if I'll never attain it.

In my opinion, this code comments itself, even more so than a manually, written-out parser would.

Here's how I would think about it. What would a grammar for Lisp look like?

    value := list | number | symbol
    list := '(' value + ')'
    number := [0-9]+
    symbol := [A-Za-z-]+
OK, great. Now how do I look at this code?

    value :: Parser Value
    value =
          List <$> (char8 '(' *> sepBy value (takeWhile1 isSpace_w8) <* char8 ')')
      <|> Number . fst . fromJust . B.readInteger <$> takeWhile1 isDigit_w8
      <|> Symbol <$> takeWhile1 (inClass "A-Za-z\\-")
Even if I don't understand the combinators, I can blank them out for now and focus on the recognizable semantic bits. I see List, Number and Symbol, ok, so my guess is that <|> does something like a | might in my grammar, and each line corresponds to a different way a value can be expressed. For list, I see some quoted '(' and ')', so I guess that 'char8' means "match this literal." In fact, I can just read all of that off, and it makes sense. Nevermind what > and < are doing, I'll ignore that for now. And so forth.

Suppose I wanted to make a superficial change, like make $ a recognized symbol in the language. That's super easy. I don't even need to look at the docs. If I want to introduce a new syntactic construct? A little harder; I'll have to go check the attoparsec docs. But you'd have to look up the docs for a library in any language, anyway.

All's not well; in particular, this code conflates the tokenizing and parsing steps (notice that I don't say anything about whitespace in my grammar, but there's some line-noise here dealing with it.) That decreases the readability a little, but you gain efficiency with it.

Maybe there's a tradeoff: the use of a library and symbolic combinators makes it harder to tell precisely how the code manages to actually do any parsing. That's true of any abstraction. But what's really great about this is that I can easily tell what the big picture of the code is. A page or two of state machine would not do that for me!

The syntax elements that you struggled with are explained here: http://learnyouahaskell.com/functors-applicative-functors-an...

They are so incredibly general that you can use them in a lot of places, so the investment to learn their meaning pays off quickly. Learn You A Haskell actually explains them _before_ explaining Monads.

When you've worked with Haskell a bit this code is actually very readable. Admittedly some experience is required.

With a two named helper functions and a bit of additional alignment everyone that has ever written a parser before in Haskell immediately recognizes the structure, thanks to use of applicative functor syntax (<|> and <$>):

    value :: Parser Value
    value =  List   <$> parenthesis (sepBy value (takeWhile1 isSpace_w8))
         <|> Number <$> parseNumber
         <|> Symbol <$> takeWhile1 (inClass "A-Za-z\\-")
    
      where parenthesis p = char8 '(' *> p <* char8 ')'
            parseNumber   = fst . fromJust . B.readInteger <$> takeWhile1 isDigit_w8