Hacker News new | ask | show | jobs
by seanmcdirmid 2381 days ago
Error recovery and reporting are for sure the biggest challenge of a production parser. If you work out of band, you can play a lot of tricks in your parser to make it much more effective in handling errors. For example, braces can be matched in many languages without parsing any other constructs in the language, meaning these errors can all be reported and recovered from independent of the rest of the language. Then again, the overloading of < and > as operators and braces in many languages defeats that a bit :)

Hand coding a parser is usually worth it for a moderately popular production language. Parser generators really shine for smaller language efforts that can’t afford the overhead and don’t really care about the niceties of decent error reporting.

1 comments

I don't think it's just error handling. A nice, human readable form has significant chance of being ambiguous - you can remove the ambiguity with a variety of transforms, putting the code into Greibach Normal Form, and this resolves the ambiguities. ... and translates pretty directly to a hand-written recursive descent parser (which you can generate also). But the thing is that here the code essentially serves as a lower level of abstraction, teasing out what you really mean. Which is to say there's nothing wrong with the code being the final spec, that way you don't have to keep the code in sync with the spec.
For a mainstream programming languages, complicated ambiguities are hard on users anyways, by writing your parser by hand you have an extra incentive to keep the grammar simple (eg Scala). A lot of the power a parser generator gives you shouldn’t be used.
Remember, complicated ambiguities are different from complicated structures.

The Ruby programming language is a poster-child for potential ambiguities that seem simple - allowing conditional before and after for example. Essentially, all the programming languages that are "human like" wind-up like this, with the sort of ambiguities that people are comfortable with in natural language. This has a cost in terms of exact expression but a lot of users think of this as being "easy on them".

Oppositely, languages without grammatical ambiguities often seem irritatingly verbose to use - new users dislike lisp's parenthesis proliferation and personally find Scala irritatingly verbose.

The complexity of compiler-writing approaches as limiting factor to language complexity probably depending what someone is familar with. Some people can spit out annotated YACC grammar pretty easily whereas my head swimming looking at the stuff. I can produce a recursive descent parser from a grammar pretty easily however.