Hacker News new | ask | show | jobs
by geal 3281 days ago
(one of the authors here): parser generators are generally good for one thing: parsing programming languages. For more complex formats, where you have to carry state around, or binary formats, they're extremely cumbersome to use. I often meet people that tell me they want the parser generator to end all parsers. But for most real world formats, you'll have to hack around the generator's limitations. Theoretically, parser combinators are less expressive, but they're just functions: you can write whatever you want inside a subparser, and integrate it with the rest. That approach works well, since we wrote a lot of complex parsers with nom.
2 comments

I thought most real world compilers tend to be hand written rather than using a generator? By no means an expert but that's something I've heard and know to be true for many real world compilers.
They probably should have used this instead: http://scottmcpeak.com/elkhound/
For C and especially C++, this makes a lot of sense. There's so much context dependence in the latter when you get to templates.
Ya, heck, not just C++, but scala and C#. You can tune the heck out of error recovery when you write a parser by hand, as well as support IDE services.

Using a combinator or generator makes sense when you really care just about getting trees out as fast as possible; e.g. what a command line batch compiler or interpreter does.

I suppose that's because they take the standpoint that they have so much testing code available, that any conflicts with the grammar are easily found.

However, in general, you can easily shoot yourself in the foot with a handwritten parser, because you can't see the conflicts by looking at the code locally.

It's usually because getting proper (as in user-friendly) error reporting is a massively weak spot with ever parser generator I've seen to date.

Conflicts is generally one of the least interesting problems in writing a parser unless you're designing a new language, and especially when hand writing recursive descent parsers, potential conflicts/ambiguities tend to become obvious quickly.

For prototyping a new language, sure, use a parser generator to test the grammar.

I too fall in the category of loving the idea of parser generator (and having written a few) but always falling back on hand-written parsers because the generators I've seen have all been inadequate. I hope that chances some day.

> For more complex formats, where you have to carry state around, or binary formats, they're extremely cumbersome to use.

That's not a theoretical issue, but more a practical issue.

> you'll have to hack around the generator's limitations

You're probably thinking of LALR(1) class generators. But we have GLR parser generators for a long time now, and they are very flexible and have little limitations. See e.g. [1].

[1] http://scottmcpeak.com/elkhound/

How do you parse ASN.1 with this generator? Or JPEG?
Or especially Semantic Designs' tool that gets a ton of mileage out of GLR:

http://semanticdesigns.com/Products/DMS/DMSToolkit.html