|
|
|
|
|
by abaines
2362 days ago
|
|
I was initially surprised that all the lexers treat "[01]" as four tokens, but it makes sense from the state diagram. In the past I've encountered JSON lexing that only considers token boundaries on "special" characters i.e. ",}]:" and whitespace. This will return a lexing error when it sees "01" (equivalently "truefalse"). |
|
One could imagine first tokenizing only based on whitespace, then only starting to figure out what the tokens are. Which means parsing them individually. Which means another parsing step.
I think this would match human more closely: structure is more obvious based on visual separation than detailed analysis.
I guess it wasn't done that way because the current way of operation means one parser to rule all sources, and that parser can handle more complicated cases. That kind of design decision is more surprising later, but is kind of understandable when you draft a language as the same time as your first parser.