|
|
|
|
|
by logicchains
312 days ago
|
|
>They did something to quantize >90% of the model parameters to the MXFP4 format (4.25 bits/parameter) to let the 120B model to fit on a single 80GB GPU, which is pretty cool They said it was native FP4, suggesting that they actually trained it like that; it's not post-training quantisation. |
|