Hacker News new | ask | show | jobs
by throwaway2016a 3268 days ago
I love seeing articles like this... my compiler design course in college was one of my favorite. Also, one of the most useful. Parsers and lexers are useful in so many places besides just code compilers.

With that said, I think there is one small issue in the article:

> The quintessential first step in any compiler is parsing the source string into an Abstract Syntax Tree

If there is a quintessential first step in writing a compiler it is doing lexical analysis with a lexer to break the program up into tokens. Then using a parser to create the AST. While you could go straight from the raw text to parsing, it makes it a lot easier if you lex it first.

3 comments

You are right and calling it quintessential might be too much. Local lexing seems interesting, but the approach I used is called combinatorial parsing and it does parsing and lexing in one go because they are so intertwined anyway. This also lets me get to the AST in just one pass over the input string with very little backtracking.

I'll update the post with this feedback.Thank you.

Well you can do both at the same time, it is called local lexing: https://arxiv.org/abs/1702.03277
I did mention in my post that the separate lexing step is not strictly necessary. I was just pointing out that a lexer follows the definition of "quintessential" more than parser does.

That paper is very recent. It sounds like it is still doing both steps just concurrently and with feedback. It isn't skipping the lexing step altogether. The benefit of separation is decreased code complexity and easier debugging.

Yes, both stages are still present in that approach, and that is a good thing, because clearly it is useful to separate lexing and parsing. It's just that the two stages are intertwined and thus lexing can be dependent on the parsing context. So it is not really going from "raw text to parsing".
I think lexing is unnecessary with some methods, e.g. parser combinators and (I think) Might’s “Parsing with Derivatives”. And there are other tactics, like the other commenter mentioned.
There are alternatives, certainly. And lisp style languages are especially easy to parse without a lexer. I was just taking issue to the term "quintessential" being used. In a classical sense, the lexer comes first.
It depends on your mental hierarchy doesn't it? It's easy enough to see lexing as a sub-step of parsing. It's also quite arguable that the classical emphasis of a separation between parsing and lexing is as much an artifact of classical tools than a "natural need".

For instance, from a Chomsky's hierarchy of languages perspective lexing and parsing are simply language analysis at two different levels on the hierarchy with lexing focused on the "regular" language embedded within a larger context-free or context-sensitive language.