Y
Hacker News
new
|
ask
|
show
|
jobs
by
wtallis
3979 days ago
Is it possible that they're using "sequential" strictly, to mean that the arithmetic isn't vectorized? What's the scalar throughput like?
1 comments
vardump
3979 days ago
Scalar output would be way less than that number, 25 GFLOPS. At most 2x clock frequency. It's likely their benchmark just doesn't support AVX2 (and FMA [1]).
You get about 25 GFLOPS if you use SSE only.
[1]:
https://en.wikipedia.org/wiki/FMA_instruction_set
link
You get about 25 GFLOPS if you use SSE only.
[1]: https://en.wikipedia.org/wiki/FMA_instruction_set