| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by logicchains 359 days ago
	>They did something to quantize >90% of the model parameters to the MXFP4 format (4.25 bits/parameter) to let the 120B model to fit on a single 80GB GPU, which is pretty cool They said it was native FP4, suggesting that they actually trained it like that; it's not post-training quantisation.

1 comments

rushingcreek 359 days ago

The native FP4 is one of the most interesting architectural aspects here IMO, as going below FP8 is known to come with accuracy tradeoffs. I'm curious how they navigated this and how the FP8 weights (if they exist) were to perform.

link

buildbot 359 days ago

One thing to note is that MXFP4 is a block scaled format, with 4.25 bits per weight. This lets it represent a lot more numbers than just raw FP4 would with say 1 mantissa and 2 exponent bits.

link