Hacker News new | ask | show | jobs
by ahaferburg 2634 days ago
I would be curious what the performance looks like on bigger, more realistic source files. And what happens if you disable any optimizations, will that influence the obj generation? What about link times?

The post made me look into string interning for my compiler. I wasn't convinced that it would be that useful. I thought that most unsuccessful string comparisons are fast anyways, because I store the length for each token. With a hash map, you still have to do one comparison for every lookup, and you also have to compute the hash. But it also greatly increases the odds that it's the right comparison. And once you did the interning, you don't need to look up strings anymore at all.

I (very sloppily) implemented a hash map, and integrated it into the lexer. Despite the poor implementation, and having to build the map in the lexer, it does speed up the check whether an identifier is a keyword, and reduced the parse time to about 70%. I get similar gains for code generation, because it speeds up the symbol lookup, but it's probably going to be less useful here, since I still have terrible O(n) lookup for globals. The absolute gains are still worth it, though.

So yeah. Thanks for encouraging me to look into it!

1 comments

Hey - I am thrilled to hear about your positive experience with interning strings. I never actually did a performance test, so I am delighted to hear your gains were substantial.

As for you questions about performance on bigger source files and twiddling optimization options, I too am curious about that. I will likely revisit those questions at some point in the future. It will be easier to do once I have baked these diagnostics into the compiler.