| HN Mirror

You're right. Current parser violates the https://en.m.wikipedia.org/wiki/Principle_of_least_astonishm... when it breaks a string of characters without a whitespace into several tokens.

One could imagine first tokenizing only based on whitespace, then only starting to figure out what the tokens are. Which means parsing them individually. Which means another parsing step.

I think this would match human more closely: structure is more obvious based on visual separation than detailed analysis.

I guess it wasn't done that way because the current way of operation means one parser to rule all sources, and that parser can handle more complicated cases. That kind of design decision is more surprising later, but is kind of understandable when you draft a language as the same time as your first parser.