| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mjs7231 1935 days ago

Shouldn't this be considered a bug in Python? Why does it even try to evaluate 0xfor without the space? Trying a few other things..

* 0xfor1 evaluates.

* 1or 2 evaluates.

* 1or2 doesn't.

* ''or'foo' evaluates.

This is gross.

5 comments

layer8 1935 days ago

That’s the normal way lexers work, given “tight” token definitions. They continue adding to the current token until an invalid (for the current token type) character is reached, and then begin parsing a new token starting with the “invalid” (but now valid for the next token) character (or the next non-whitespace character).

“1or2” is lexed into “1” (integer) followed by “or2” (identifier), which is valid on the lexer level but then fails on the grammar level.

link

sabhiram 1935 days ago

The lexer unfortunately is a greedy token matcher. As soon as the 0xf "made sense" to it, and 0xfo did not - it did the same thing it would do in the case of something like 0xf+3. Except the + was an `or` in this case which is kosher. There is an idempotent step you can take where extra spaces are added before the AST is formed to make this sort of thing easier. The good news is, with a decent lint / format flow - these sorts are easy to catch.

link

gfiorav 1935 days ago

Probably a lexer bug. "foo"or should never be processed as "foo" and token OR

link

njharman 1935 days ago

why not? "or" is an operator, like "+", "foo"+"bar" should be valid. Why have special inconsistent case for "or".

link

gfiorav 1935 days ago

You know what? You’re right.

I guess it’s an operator token after all.

link

kristaps 1935 days ago

Not by design, fortunately: https://bugs.python.org/issue43833

link

xxpor 1935 days ago

That hasn't been confirmed. Are we sure that it's not an inherent ambiguity in the grammar?

link

joshuamorton 1935 days ago

That's not totally clear. A bug being filed doesn't mean it's accepted. And this has been (ab)used for quite some time in various python codegolf. See https://codegolf.stackexchange.com/a/56 from 2011.

link

Sohcahtoa82 1935 days ago

It is 100% by design. It's even documented.

https://docs.python.org/3/reference/lexical_analysis.html#wh...

This is not a bug.

link

_kst_ 1935 days ago

The cited documentation says:

> Whitespace is needed between two tokens only if their concatenation could otherwise be interpreted as a different token (e.g., ab is one token, but a b is two tokens).

The two tokens in this case are "0xf" and "or". Their concatenation cannot be interpreted as a different token, because "0xfor" is not a valid token. Therefore, if I'm reading the rule correctly, whitespace is needed in this case.

"0xffor" is another interesting case. It's also not a valid token, but it could be interpreted as two tokens in two different ways: "0xf" "for" or "0xff" "or". (Python does the latter. I presume it uses something like C's "maximal munch" rule.)

link

Sohcahtoa82 1934 days ago

Your conclusion is the exact opposite of what the documentation explicitly states.

> Whitespace is needed between two tokens only if their concatenation could otherwise be interpreted as a different token (e.g., ab is one token, but a b is two tokens).

Because the concatenation of "0xf" and "or" can't be interpreted as a different token, the whitespace is not needed.

link

_kst_ 1934 days ago

You're right, and I was wrong.

I dislike the rule, and I strongly think that "0xfor" should require whitespace between "0xf" and "or" (I'm sure that influenced my reading), but you're right about what the rule says.

(Apparently I can't edit my previous comment.)

link

goto11 1934 days ago

It is not strictly speaking a bug, since it works as intended. But it is clearly a counter-intuitive behavior and could be improved. Making 0xfor a syntax error would definitely be an improvement.

But requiring whitespace between all tokens is not an acceptable solution, since "2+2" should work. Always equiring whitespace between alphanumerical characters in different tokens would make sense.

link