Hacker News new | ask | show | jobs
by haberman 3555 days ago
> If you implement a top-down recursive descent parser, you'll quickly learn how it maps to LL.

This will indeed tell you how top-down parsers work. But this doesn't tell you very much about LL specifically.

The trickiest part of parsing is deciding what path to take. When you hand-write a recursive descent parser, your code for deciding between alternatives is ad hoc. For example if you are parsing JSON you'll have some code that looks like this:

    if next_token == "{":
      parse_object()
    elif next_token == '[':
      parse_array()
    elif next_token == 'null':
      parse_null()
    elif next_token == 'true' or next_token == 'false':
      parse_boolean()
    elif next_token == '"':
      parse_string()
    else:
      parse_number()
A person writing a recursive descent parser has to invent this chain of "if" tests manually and convince him/herself that it is correct. For JSON it's simple because JSON is LL(1). For other languages it can be much more complicated, and can involve multiple tokens of lookahead.

Almost all of the "secret sauce" in LL, LR, etc. is how to construct these tables of if/then in a way that is rigorous and provably correct.

It's true that understanding generally how top-down and bottom-up parsers work is a great step towards understanding parsers.

1 comments

The first and follow sets for LL are the first hurdle you'll hit when you try to parse something where the next production (the next path) doesn't consume the token straight away. The theory becomes instantly accessible for practical reasons. So I respectfully disagree.

You'll quickly learn there's a mechanical transformation from LL(1) grammars to recursive descent. And if you write a recursive descent parser with one token of lookahead, no backtracking, and straightforward syntax actions, it has a mechanical translation into LL(1).

Anyway, I described the journey I went through learning compiler theory before I went to college. I covered undergrad compiler course content long before I left secondary school, and understood it at a much deeper level. I got a job at Borland working on the Delphi compiler (6 years I put in) on the back of the same knowledge. It worked for me.

Sorry for being a bit pedantic about it. People sometimes write hand-written recursive descent parsers and call them LL parsers, which is sorta true at some level, but misses IMO the key ingredient of automatically-constructed parse tables. I may have been a bit over-eager in reiterating this distinction.

How did you like working on Delphi? I wish I had been studying this stuff that early!

It was educational, to say the least. I ended up working mostly remotely (H-1B visa, when it came through, I didn't follow through on - I stayed in London with my GF). Apart from learning lots of implementation tricks in compilers (IDE / code completion support especially) and efficient code, as well as picking up experience in things that were more incidental to my immediate interests, like smart linking (turns out, it's very similar to garbage collection) and x64 disassembly for the 64-bit debugger.

More generally I learned how to tackle very complex problems under my own steam, even more so than I had as an autodidact. I have no difficulty drilling down to the next level, all the way to the CPU if necessary, if I have a problem, even if I'm starting out at the level of an interpreted language. I currently work on a web app written in Rails / Coffeescript combo; but I can debug MRI if we get an Ruby issue, and I can and have debugged MySQL when we had crashing issues.