Hacker News new | ask | show | jobs
by dzaima 836 days ago
The official 1BRC was Java-only, so no using any architecture-specific SIMD at all; the test system did have AVX2 though, and that's what most non-competing native solutions (including mine) targeted.

Completely forgot about pmuludq, that works too for SSE2. But a 32-bit result is insufficient for the magic number method, needs to be at least 36-bit. I originally used vpmaddubsw+vpmaddwd, but switched to vpmuldq for the reduced register pressure, and I was already only parsing 4 numbers in ymm registers so the 64-bit result didn't affect me (after parsing 4 temperatures I immediately did the respective hashmap stuff for each).