|
|
|
|
|
by Retr0id
252 days ago
|
|
I tried to run the benchmark on my M1 Pro macbook, but the "baseline" is written with x86 intrinsics and won't compile. Are the benchmark results in the README real? (The README itself feels very AI-generated) Looking at the makefile, it tries to link the x86 SSE "baseline" implementation and the NEON version into the same binary. A real headscratcher! Edit: The SSE impl gets shimmed via simd-everywhere, and the benchmark results do seem legit (aside from being slightly apples-to-oranges, but that's unavoidable) |
|
For the baseline you need SIMDe headers: https://github.com/simd-everywhere/simde/tree/master/simde. These alias x86 intrinsics to ARM intrinsics. The baseline is based on the previous State-of-The-Art (https://arxiv.org/abs/1209.2137) which happens to be x86-based; using SIMDe to compile was the highest-integrity way I could think of to compare with the previous SOTA.
Note: M1 chips specifically have notoriously bad small-shift performance, so the benchmark results will be very bad on your machine. M3 partially fixed this, M4 fixed completely. My primary target is server-class rather than consumer-class hardware so I'm not too worried about this.
The benchmark results were cpy-pasted from the terminal. The README prose was AI generated from my rough notes (I'm confident when communicating with other experts/researchers, but less-so with communication to a general audience).