Hacker News new | ask | show | jobs
by jablongo 1058 days ago
It has to do with the precision of the values stored in those rows and columns. If they could be coerced into a narrower range (without losing information) then we could effectively store them each with 8 bits or something. The +1 prevents blowups when the denominator in its current form approaches 0, and without those blowups, then we can use less bits, in theory.
1 comments

That is only true if the using the new softmax changes the dynamic range of the values. We are using floating point not fixed point. So if before our values went from 1 to 5000 and now they go from 0.0002 to 1 we still have the same dynamic range and so still need the same resolution.
The quantized versions are not floats but ints.