|
|
|
|
|
by nrds
325 days ago
|
|
> ambiguities It's important to note that ambiguities are something which exist in service of parser generators and the restricted formal grammars that drive them. They do not actually exist in the language to be parsed (unless that language is not well-specified, but then all bets are off and it is meaningless to speak of parsing), because they can be eliminated by side-conditions. For example, one famous ambiguity is the dangling 'else' problem in C. But this isn't an actual ambiguity in the C language: the language has a side-condition which says that 'else' matches to the closest unmatched 'if'. This is completely unambiguous and so a recursive descent parser for C simply doesn't encounter this problem. It is only because parser generators, at least in their most academic form, lack a way to specify this side-condition that their proponents have to come up with a whole theory of "ambiguities". (Shockingly, Wikipedia gets this exactly right in the article on dangling else which I just thought to look up: "The dangling else is a problem in programming of parser generators".) Likewise goes the problem of left-recursion. Opponents of recursive descent always present left-recursion as a gotcha which requires some special handling. Meanwhile actual programmers writing actual recursive descent parsers don't have any idea what these academics are talking about because the language that they're parsing (as it exists in their mind) doesn't feature left-recursion, but instead iteration. Left-recursion is only introduced in service of restricted formal grammars in which recursion is the only available primitive and iteration either doesn't exist or is syntactic sugar for recursion. For the recursive descent user, iteration is a perfectly acceptable primitive. The reason for the discrepancy goes back to side-conditions: iteration requires a side-condition stating how to build the parse tree; parser generators call this "resolving the ambiguity" because they can't express this in their restricted grammar, not because the language was ambiguous. |
|
Only partially true. How do you define the language to be parsed? It's with a grammar. If the grammar can yield two different parse trees for the same input, it's ambiguous. In LR parlance, if your grammar is ambiguous because of a shift-reduce conflict, it's because you stuffed up your grammar.
That's a real problem. It the difference between parsing "1 + 2 / 3" as "(1 + 2) / 3" and "1 + (2 / 3)". The two interpretations yield very different outcomes. The reason you see so many people here say "use a generated LL or LR parser" is the generator will find and report that mistake. It's a very easy mistake to make, and you won't realise you've made it.
Then there are what LR calls reduce-reduce conflicts. Yes, that may happen because the LR parser can't look far enough ahead. Or, it may again be because you've stuffed you grammar. Or it may be because the language you have in your head really isn't context free. Perl is in the last category. They claim to have got around it by saying its a "do what I mean" language. Fine, but it turns out in some cases what they think a string obviously means doesn't agree with what I thought it obviously meant.