| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by petters 57 days ago
	That's a good idea and it exists: https://www.johndcook.com/blog/2026/04/18/qlora/ It seems quite wastful to have two zeros when you only have 4 bits it total

1 comments

saulpw 57 days ago

OTOH, it seems quite plausible that the most important numbers to represent are:

   +0
   -0
   +1
   -1
   +inf
   -inf

link

parsimo2010 57 days ago

In standard FP32, the infs are represented as a sign bit, all exponent bits=1, and all mantissa bits=0. The NaNs are represented as a sign bit, all exponent bits=1, and the mantissa is non-zero. If you used that interpretation with FP4, you'd get the table below, which restricts the representable range to +/- 3, and it feels less useful to me. If you're using FP4 you probably are space optimized and don't want to waste a quarter of your possible combinations on things that aren't actually numbers, and you'd likely focus your efforts on writing code that didn't need to represent inf and NaN.

  Bits s exp m  Value
  -------------------
  0000 0  00 0     +0
  0001 0  00 1   +0.5
  0010 0  01 0     +1
  0011 0  01 1   +1.5
  0100 0  10 0     +2
  0101 0  10 1     +3
  0110 0  11 0     +inf
  0111 0  11 1     NaN
  1000 1  00 0     -0
  1001 1  00 1   -0.5
  1010 1  01 0     -1
  1011 1  01 1   -1.5
  1100 1  10 0     -2
  1101 1  10 1     -3
  1110 1  11 0     -inf
  1111 1  11 1     NaN

link

saulpw 56 days ago

I can see the most important values being:

   ± 0 (infinitesimal)
   ± 10^-2n
   ± 10^-n
   ± 1 (unity)
   ± 10^n
   ± 10^2n
   ± infinity

For fp4, this leaves 2 values. Maybe one of them should be NaN. What should the other one be?

link

Dwedit 57 days ago

Why waste a slot on -0?

link

adampunk 57 days ago

You need it if you want the idea of total ordering over the extended Reals. There's +/- infinity--an affine closure, not projective (point at infinity)--so to make that math work you need to give 0 a sign.

link

saulpw 57 days ago

Because it means "infinitesimal negative" which is distinct from "infinitesimal positive".

link

Dylan16807 57 days ago

That sounds pretty niche. What's a use case where you have less than 8 bits and that distinction is more important than having an extra finite value? I don't think AI is one.

link

jlokier 57 days ago

For neural net gradient descent, automatic differentiation etc, the widely used ReLU function has infornation carrying derivatives at +0 and –0 if those are infinitesimals.

link

Dylan16807 57 days ago

Barely any information. After surviving RELU that signed zero is probably getting added to another value and then oops the information is gone. It sounds a lot worse than properly spaced values.

link