Hacker News new | ask | show | jobs
by rstuart4133 319 days ago
> It's important to note that ambiguities are something which exist in service of parser generators and the restricted formal grammars that drive them. They do not actually exist in the language to be parsed

Only partially true. How do you define the language to be parsed? It's with a grammar. If the grammar can yield two different parse trees for the same input, it's ambiguous. In LR parlance, if your grammar is ambiguous because of a shift-reduce conflict, it's because you stuffed up your grammar.

That's a real problem. It the difference between parsing "1 + 2 / 3" as "(1 + 2) / 3" and "1 + (2 / 3)". The two interpretations yield very different outcomes. The reason you see so many people here say "use a generated LL or LR parser" is the generator will find and report that mistake. It's a very easy mistake to make, and you won't realise you've made it.

Then there are what LR calls reduce-reduce conflicts. Yes, that may happen because the LR parser can't look far enough ahead. Or, it may again be because you've stuffed you grammar. Or it may be because the language you have in your head really isn't context free. Perl is in the last category. They claim to have got around it by saying its a "do what I mean" language. Fine, but it turns out in some cases what they think a string obviously means doesn't agree with what I thought it obviously meant.

1 comments

> How do you define the language to be parsed? It's with a grammar.

False. This is how you define a language _to a parser generator_, but it is not how humans (and/or developers) define languages to each other.

> you won't realise you've made it

This is literally impossible in a recursive descent parser. I'm not saying getting it wrong is impossible, of course not. But what you literally cannot do (without concerted intentional effort) is make it ambiguous. Your parser will parse one first, or the other first, or either one left-to-right; and you will know which of these it does by reading the code.