| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jhj 2782 days ago

At least 69% more multiply-add flops at the same power iso-process is nothing to sneeze at (we're largely power/heat bound at this point), and unlike normal floating point (IEEE or posit or whatever), multiplication, division/inverse and square root are more or less free power, area and latency-wise. This is not a pure LNS or pure floating point because it is a hybrid of "linear" floating point (FP being itself hybrid log/linear, but the significand is linear) and LNS log representations for the summation.

Latency is also a lot less than IEEE or posit floating point FMA (not in the paper, but the results were only at 500 MHz because the float FMA couldn't meet timing closure at 750 MHz or higher in a single cycle, and the paper had to be pretty short with a deadline, so couldn't explore the whole frontier and show 1 cycle vs 2 cycle vs N cycle pipelined implementations).

The floating point tapering trick applied on top of this can help with the primary chip power problem, which is moving bits around, so you can solve more problems with a smaller word size because your encoding matches your data distribution better. Posits are a partial but not complete answer to this problem if you are willing to spend more area/energy on the encoding/decoding (I have a short mention about a learned encoding on this matter).

A floating point implementation that is more efficient than typical integer math but in which one can still do lots of interesting work is very useful too (providing an alternative for cases where you are tempted to use a wider bit width fixed point representation for dynamic range, or a 16+ bit floating point format).

1 comments

grandmczeb 2782 days ago

The work is definitely great and I have no doubt we'll see new representations used in the future. But at least on the chip I work on, this would be a <5% power improvement in the very best case. For the risk/complexity involved, I would hope for a lot more.

link