|
|
|
|
|
by michelpp
819 days ago
|
|
This is very cool and a real interesting read! For those in the comments confused about how this is better, the paper is talking about synthesizing matrix multiplication pipelines in hardware, like an FPGA or ASIC. On a CPU or GPU you won't notice because adds and multiplications take the same amount of time generally, but multiplication units takes up many more transistors, so if you can reduce the circuit complexity you can increase the speed and parallel throughput and reduce power and routing complexity. This approach could be particularly useful for efficient sparse matrix multiplication accelerators. Another cool way to eliminate multiplication in matrix multiplication is to use different semirings [1]. The Tropical Semiring [2] for example substitutes addition for multiplication and min (or max) for addition. It's still matrix multiplication but with substituted binary operations. The research in this relatively new field of Tropical Algebra [3] is quite active and rich right now, being used for all kinds of optimization problems and in research for optimizing neural networks [4]
. This approach also lends itself to hardware synthesis since most FPGA configurable logical blocks can add/min/max in one clock cycle, whereas efficient multiplication requires fixed dedicated on-chip hardware multipliers. Another way to efficiently remove multiplications with a different but related semiring is to use a Log Semiring [5]. If you have to multiply chains of probabilities (like Markov chains) then the numbers quickly become very small and floating point loses its accuracy to represent the numbers. By scaling the numbers first by taking the log, multiplication becomes addition and addition becomes x + log1p(exp(y - x)). [1] https://en.wikipedia.org/wiki/Semiring [2] https://en.wikipedia.org/wiki/Tropical_semiring [3] https://en.wikipedia.org/wiki/Tropical_geometry [4] https://proceedings.mlr.press/v80/zhang18i/zhang18i.pdf [5] https://en.wikipedia.org/wiki/Log_semiring |
|