| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sifar 456 days ago

I have used Karatsuba's & Winograd's Inner product [0] algorithm in my work for wide multi-simd integer multipliers and matrix multiplication HW for DSPs. The latter cuts down the MACs by half - n^3/2 instead of n^3. I think the paper talks about it's derivative - FFIP.

The issue is memory bandwidth. These techniques indeed help you save multiplier area however the performance is still bandwidth limited - you'd need to be able to feed more data per cycle to increase performance.

One thing the paper doesn't talk about is energy. For DNN, at the network level the energy consumed by integer macs is not that high. Localizing data computation would have a much more impact on energy reduction than optimizing MACs.

[0] https://ieeexplore.ieee.org/document/1687427