Hacker News new | ask | show | jobs
by simgt 267 days ago
For the 8 bits case in the first benchmark, why isn't `Vec<u8>` just as fast as the rest? Is it due to the compiler emitting poorer instructions in that case or doing extra checks the other implementations don't?
1 comments

It's not about poorer instructions; a get_unchecked on a Vec<u8> is just a single memory access, which is as good as it gets. The difference is likely down to cache locality effects created by the benchmark loop itself.

The benchmark does a million random reads. For the FixedVec implementation with bit_width=8, the underlying storage is a Vec<u64>. This means the data is 8x more compact than the Vec<u8> baseline for the same number of elements.

When the random access indices are generated, they are more likely to fall within the same cache lines for the Vec<u64>-backed structure than for the Vec<u8>. Even though Vec<u8> has a simpler access instruction, it suffers more cache misses across the entire benchmark run.