Hacker News new | ask | show | jobs
by DannyBee 4134 days ago
The method you are using for comparison is commonly simdized and used for string matching.

Note also that intel/AMD SSE4+ has a 32 bit/64 bit popcnt instruction with 3 cycle latency/1 cycle throughput (for both 32 bit and 64 bit version), and so is faster for counting bits/matches than any of the methods you are using :)

1 comments

Nice! Would love to see it implemented - though that popcnt instruction isn't accessible via JavaScript as far as I'm aware. ;)
Also note - your while loop isn't 0 cycles, in fact, it's probably worse than the bit-twiddling depending on sparseness.

This is because the test and branch always happens, and is essentially guaranteed to be mispredicted a lot.

These branch mispredictions are likely to cost a lot more than the n cycles it theoretically saves.