Hacker News new | ask | show | jobs
by sidhu1f 2961 days ago
Top of the line Arria-10 FPGAs have about 1500 floating point MACs [1]. Using such a device, Intel claims ~1 TFLOP sustained for GEMM, the standard matrix multiply operation [2].

[1] https://www.altera.com/content/dam/altera-www/global/en_US/p...

[2] https://www.altera.com/content/dam/altera-www/global/en_US/p...

1 comments

I would give a very strong health warning about taking those numbers seriously. What that 2nd reference doesn't make clear is that it is NOT a standard Matrix Multiply Operation, it's an 11 row by 16 column Matrix Multiplication.

This is important because unlike in software where performance scales well. For FPGA you would have to decompose every matrix multiplication into 11x16 style matrix multiplies. They don't mention this overhead in their specs.