| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mluo 489 days ago
	Hi, one of the lead authors for this work. We recommend using Bfloat16 (not fp16), quantization for small models can really hurt performance!

3 comments

CamperBob2 489 days ago

Have you compared it to the 1.58 bit dynamic quant model based on the original R1 (i.e., not a distillation)? Whatever unsloth did, it doesn't seem to be giving up much reasoning performance over the full Q8 version.

link

mluo 489 days ago

It's simply bc the model is small (1.5B), making it sensitive to weight perturbations

link

simonw 489 days ago

Is there a GGUF version of your model anywhere that you recommend? I'm on a Mac.

link

mluo 489 days ago

Think there are some people who made GGUFs as branches of our model, try it out!

https://huggingface.co/models?other=base_model:quantized:age...

link

newman314 488 days ago

Is there a MLX version that can be added to the fullmoon iOS app?

link