Hacker News new | ask | show | jobs
by scott_s 5656 days ago
Meaning, parse word by word until you hit a key word or a significant character (,:". etc).

If keywords are allowable in identifiers (such as "end of file"), then your algorithm is not sophisticated enough. When you encounter a token that is the same token as a keyword, you need to use context to determine if it is actually a keyword or part of an identifier.

This may be a serious problem if the grammar has "<identifier> <keyword>" in it. That is, "X keyword" could be the identifier "X keyword" or it could be the identifier "X" followed by "keyword." There's a reason that most programming languages require that identifiers are a single token.

2 comments

> When you encounter a token that is the same token as a keyword, you need to use context to determine if it is actually a keyword or part of an identifier.

You're presuming here that a space delimits tokens. In this language, that may not be the case. The lexer may create a single token from "a b c".

>If keywords are allowable in identifiers

Big "if" (why shouldn't it disallow them?), and completely resolved by modifying your naming scheme in those situations: EndOfFile is unambiguous, as is end_of_file, ifSuccess, etc.

It's unusual as most programming languages allow keywords to appear in identifiers (for example, new_thing is a legal C++ identifier). Further, if I understand the language correctly, the literal "end_of_file" becomes the same identifier as "end of file". And the stated purpose of allowing white space in identifiers is to avoid camel case and underscores.
I don't think that's the case. I think the example was just to show how you can write with spaces instead of underscores. I could be wrong though, I haven't tried the language.

The documentation doesn't state one way or the other, but it does include underscores as part of identifiers, and doesn't mention any stripping. Only that spaces are ignored entirely.