Hacker News new | ask | show | jobs
by vardump 3979 days ago
The chart seems to indicate about 25 GFLOPS for sequential performance, while real value is up to 100 GFLOPS theoretical at 3.1 GHz on Haswell/Broadwell on a single core.

While realistic single core performance won't of course be approaching 100 GFLOPS, 25 is a pretty lowball value.

1 comments

Is it possible that they're using "sequential" strictly, to mean that the arithmetic isn't vectorized? What's the scalar throughput like?
Scalar output would be way less than that number, 25 GFLOPS. At most 2x clock frequency. It's likely their benchmark just doesn't support AVX2 (and FMA [1]).

You get about 25 GFLOPS if you use SSE only.

[1]: https://en.wikipedia.org/wiki/FMA_instruction_set