| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by remexre 651 days ago
	Isn't this just taking advantage of "log(x) + log(y) = log(xy)"? The IEEE754 floating-point representation stores floats as sign, mantissa, and exponent -- ignore the first two (you quantitized anyway, right?), and the exponent is just an integer storing log() of the float.

2 comments

mota7 651 days ago

Not quite: It's taking advantage of (1+a)(1+b) = 1 + a + b + ab. And where a and b are both small-ish, ab is really small and can just be ignored.

So it turns the (1+a)(1+b) into 1+a+b. Which is definitely not the same! But it turns out, machine guessing apparently doesn't care much about the difference.

link

amelius 651 days ago

You might then as well replace the multiplication by the addition in the original network. In that case you're not even approximating anything.

Am I missing something?

link

dotnet00 650 days ago

They're applying that simplification to the exponent bits of an 8 bit float. The range is so small that the approximation to multiplication is going to be pretty close.

link

tommiegannert 651 days ago

Plus the 2^-l(m) correction term.

Feels like multiplication shouldn't be needed for convergence, just monotonicity? I wonder how well it would perform if the model was actually trained the same way.

link

dsv3099i 650 days ago

This trick is used a ton when doing hand calculation in engineering as well. It can save a lot of work.

You're going to have tolerance on the result anyway, so what's a little more error. :)

link

convolvatron 651 days ago

yes. and the next question is 'ok, how do we add'

link

kps 651 days ago

Yes. I haven't yet read this paper to see what exactly it says is new, but I've definitely seen log-based representations under development before now. (More log-based than the regular floating-point exponent, that is. I don't actually know the argument behind the exponent-and-mantissa form that's been pretty much universal even before IEEE754, other than that it mimics decimal scientific notation.)

link

dietr1ch 651 days ago

I guess that if the bulk of the computation goes into the multiplications, you can work in the log-space and simply sum, and when the time comes to actually do a sum on the original space you can go back and sum.

link

a-loup-e 651 days ago

Not sure how well that would work if you're often adding bias after every layer

link