Hacker News new | ask | show | jobs
by layer8 1888 days ago
That’s the normal way lexers work, given “tight” token definitions. They continue adding to the current token until an invalid (for the current token type) character is reached, and then begin parsing a new token starting with the “invalid” (but now valid for the next token) character (or the next non-whitespace character).

“1or2” is lexed into “1” (integer) followed by “or2” (identifier), which is valid on the lexer level but then fails on the grammar level.