Hacker News new | ask | show | jobs
by quincunx 1898 days ago
Hey, thanks for taking the time. I appreciate the insights.

Carburetta is designed for developer productivity and hence "least surprise"; and so combines the classical distinction of scanner ("lex") and parser ("yacc") but fuses those capabilities together.

This simplifies development as the generated driver code can do more housekeeping and the code in Carburetta becomes more expressive. It is not the goal to improve on the algorithms / state of the art academics of parsers. Instead, it is the goal to improve productivity while using predictable and well known approaches / not reinventing the wheel, however noble that can be. Algorithms here are NFA -> DFA -> scantable for the scanner, and LALR for the parser; these are old, well established algorithms.

That said, have a look at xlalr.c if you'd like to go exotic and disambiguate grammars (multi symbol reductions determined at runtime), but a conscious choice was made to stick with straight LALR for least surprise. Ambiguous grammars, I find, become difficult to think about. xlalr.c is therefore not used for the main parser generation, but the classic lalr.c is favored. You should be able to understand everything that's going on using only base textbook materials (e.g. dragon book.) The xlalr.c algorithm will be phased out eventually, I believe it is currently only used internally (Carburetta, due to the bootstrap problem, is a bit of a chimera, and does not always use itself.)

That said, there is direct support for some of the ambiguities you speak of; e.g. for instance, the infamous "typedef name" ambiguity on identifiers in C can be implemented with <prefix>stack_accepts() and $chg_term(terminal). Have a look in the manual (HTML version is online) for those.

RawParser is cool, but seems to have different goals in mind.