|
|
|
|
|
by jules
2722 days ago
|
|
Lexers don't do strcmps, they use a finite state machine to determine tokens. For instance if we have the tokens ALTER, AND, ANY, SELECT, FROM it will compile into a state machine like this: switch(str[i++]){
case 'A':
switch(str[i++]){
case 'L': ...
case 'N':
switch(str[i++]){
case 'D':
found token!
case 'Y:
found token!
default:
syntax error!
}
...
}
case 'S': ...
case 'F': ...
...
}
Lexers already do this, and in each place where a token is found, it can know which token it was based on the DFA state that it's in. |
|
Perfect hashing trades more computing for less cache misses.
[edit] Also, you shouldn't forget that sparse switch...case aren't O(1).