Hacker News new | ask | show | jobs
by vardump 3979 days ago
Scalar output would be way less than that number, 25 GFLOPS. At most 2x clock frequency. It's likely their benchmark just doesn't support AVX2 (and FMA [1]).

You get about 25 GFLOPS if you use SSE only.

[1]: https://en.wikipedia.org/wiki/FMA_instruction_set