| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Marat_Dukhan 2819 days ago
	FBGEMM is faster than theoretical peak FP32 (single-precision floating-point) performance, therefore its faster than SGEMM/DGEMM in any BLAS library

1 comments

danmg 2818 days ago

If they're showing something higher than a 'theoretical peak' then it's a fantastic result that must be investigated carefully for any error in data collection.

Also, that doesn't stop them from showing an apples-to-apples comparison against other libraries that provide GEMM. If other libraries are reporting the same 'beyond theoretical peak' then it most certainly is a data collection error.

link

Marat_Dukhan 2818 days ago

Performance on the plot is higher than FP32 peak, but there's no error - because FBGEMM does not compute in FP32, it computes in 8-bit fixed point. On a Broadwell CPU, you can do 16 FP32 multiply-adds (2x 8-wide FMA instructions via VFMAxxxPS instructions), but 32 8-bit multiply adds (1x 32-wide multiplication with accumulation of adjacent results via VPMADDUSBW instruction).

link

danmg 2818 days ago

Ok. Then this will introduce significant truncation errors and it's not a general GEMM. That's like claiming you've made the fastest FEM routine in the world by doing everything in half-precision.

link