Hacker News new | ask | show | jobs
by wtallis 3979 days ago
Is it possible that they're using "sequential" strictly, to mean that the arithmetic isn't vectorized? What's the scalar throughput like?
1 comments

Scalar output would be way less than that number, 25 GFLOPS. At most 2x clock frequency. It's likely their benchmark just doesn't support AVX2 (and FMA [1]).

You get about 25 GFLOPS if you use SSE only.

[1]: https://en.wikipedia.org/wiki/FMA_instruction_set