|
|
|
|
|
by wycats
3390 days ago
|
|
> "favorite" wart at the moment: '% x ' parses to the literal string "x" - "%" when not preceeded by an operand that makes it the infix operator "%" starts a quote-sequence where the following character indicates what the quote character should be - with the exception of a few special character, most characters will set the quote character to its identity. So in '% x ', the quote character is space. How is this different, in principle, from any other unary/binary operator like plus? In most languages, when `+` is preceded by an expression, it's a binary operator, otherwise it's a unary operator. The same seems true here: when `%` is preceded by an expression it's binary `%`, otherwise it's a unary operator with the semantics you describe. |
|
The difference between this and "+" is that if I present you with "+ x ", you know that this represents two tokens: "+" and "x". If you don't know what preceded it, you don't know if "+" is a prefix or infix operator, or what parse to return, but assuming the string starts at a token boundary, you can unambiguously tokenize it lexically without additional knowledge.
But for "% x ", you don't know in isolation if it represents the single token representing the literal string "x" or if it represents the infix operator "%" and the identifier "x".
It's an example of one of the features that prevents you from doing bottom up lexical analysis of Ruby without doing a full parse and pushing information down from the parser.
As I said, it's not hard - in the case of an operator precedence parser, if your value stack is not empty when you see "%", then you need to parse what follows as an expression. In the case it is empty, you need to parse it as a quoted string. There are a variety of ways to do that. But it's relatively uncommon for languages to be impossible to unambiguously tokenize without doing higher level processing.
I've also yet to see a single example of Ruby code where this freedom to pick pretty much any quote character has been used in a sensible way - or at all (I certainly have seen cases where "expected" quote characters have been used). Thankfully. So it's an unnecessary wart.