| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by logicalshift 1254 days ago

I wrote a parser generator quite a long time ago that I think improves the syntax quite a lot, and which has an interesting approach to generalisation: you can write conditions on the lookahead (which are just grammars that need to be matched in order to pick a given rule when a conflict needs to be resolved). This construct makes it much easier to write a grammar that matches how a language is designed.

Here's an ANSI-C parser, for example: https://github.com/Logicalshift/TameParse/blob/master/Exampl... - this is an interesting example because `(foo)(bar)` is fully ambiguous in ANSI C: it can be a cast or a function call depending on if `foo` is a type or a variable.

The new construct makes it possible to extend grammars and disambiguate them - here's a C99 grammar that extends the ANSI-C grammar: https://github.com/Logicalshift/TameParse/blob/master/Exampl....

It also allows matching at least some context-sensitive languages - see https://github.com/Logicalshift/TameParse/blob/master/Exampl...

An advantage over GLR or backtracking approaches is that this still detects ambiguities in the language so it's much easier to write a grammar that doesn't end up running in exponential time or space, plus when an ambiguity is resolved by the generalisation, which version is specified by the grammar and is not arbitrary (backtracking) or left until later (GLR).

I was working on improving error handling when I stopped work on this, but my approach here was not working out.

(This is a long-abandoned project of mine but the approach to ambiguities and the syntax seem like they're novel to me and were definitely an improvement over anything else I found at the time. The lexer language has a few neat features in it too)