Hacker News new | ask | show | jobs
by mikeash 5350 days ago
Would it? Loading a single byte should still be fast (in the absolute worst case, it's a single shift and a mask on top of the cost of loading 4 or 8 bytes) and using less memory means it's much more likely for the table to remain in cache (and evict less other stuff). In fact, I'd go so far as to wager that a uint8_t[16] (or uint64_t[4] or whatever) with one bit per character, manually extracted, would be the fastest way to do it. Not that I've tested this or anything, and I could certainly be wrong.
1 comments

I hate to break up perfectly good hand-waving with numbers but I put together a test. On my Core i7 with an 18mb input, which is a large novel duplicated many times, the performance factors, normalized to the if-based version are:

    if: 1x
    byte-based lookup: 0.83x
    int-based lookup: 0.84x
    bit-vector based lookup: 0.93x
http://pastebin.com/5twfXfEt