Hacker News new | ask | show | jobs
by wycats 3390 days ago
> "favorite" wart at the moment: '% x ' parses to the literal string "x" - "%" when not preceeded by an operand that makes it the infix operator "%" starts a quote-sequence where the following character indicates what the quote character should be - with the exception of a few special character, most characters will set the quote character to its identity. So in '% x ', the quote character is space.

How is this different, in principle, from any other unary/binary operator like plus? In most languages, when `+` is preceded by an expression, it's a binary operator, otherwise it's a unary operator.

The same seems true here: when `%` is preceded by an expression it's binary `%`, otherwise it's a unary operator with the semantics you describe.

1 comments

It's not horribly hard to parse. It is however ugly and surprising to almost everyone that sees it.

The difference between this and "+" is that if I present you with "+ x ", you know that this represents two tokens: "+" and "x". If you don't know what preceded it, you don't know if "+" is a prefix or infix operator, or what parse to return, but assuming the string starts at a token boundary, you can unambiguously tokenize it lexically without additional knowledge.

But for "% x ", you don't know in isolation if it represents the single token representing the literal string "x" or if it represents the infix operator "%" and the identifier "x".

It's an example of one of the features that prevents you from doing bottom up lexical analysis of Ruby without doing a full parse and pushing information down from the parser.

As I said, it's not hard - in the case of an operator precedence parser, if your value stack is not empty when you see "%", then you need to parse what follows as an expression. In the case it is empty, you need to parse it as a quoted string. There are a variety of ways to do that. But it's relatively uncommon for languages to be impossible to unambiguously tokenize without doing higher level processing.

I've also yet to see a single example of Ruby code where this freedom to pick pretty much any quote character has been used in a sensible way - or at all (I certainly have seen cases where "expected" quote characters have been used). Thankfully. So it's an unnecessary wart.