| HN Mirror

It's not about poorer instructions; a get_unchecked on a Vec<u8> is just a single memory access, which is as good as it gets. The difference is likely down to cache locality effects created by the benchmark loop itself.

The benchmark does a million random reads. For the FixedVec implementation with bit_width=8, the underlying storage is a Vec<u64>. This means the data is 8x more compact than the Vec<u8> baseline for the same number of elements.

When the random access indices are generated, they are more likely to fall within the same cache lines for the Vec<u64>-backed structure than for the Vec<u8>. Even though Vec<u8> has a simpler access instruction, it suffers more cache misses across the entire benchmark run.