| HN Mirror

It's one of the improvements they claimed in the 2019 presentation. https://youtu.be/JZE3_0qvrMg?feature=shared&t=1054 Reporting 10% speedup on find, but 15% slowdown on insert. The speedup probably comes from using 4 more bits of hash, which leads to fewer collisions. And slowdown from more complicated code for the mirroring.

I'm still confused on the SIMD alignment. There are load instructs with alignment requirements (_mm_load_si128) and without (_mm_loadu_si128). Both claim the same latency and throughput. Somewhere I've heard the slowdown of unaligned access comes from using more bandwidth to load two aligned 128-bit lines to compose the unaligned value. But no idea if this affects multiple loads of continuous memory.