Hacker News new | ask | show | jobs
by mluo 489 days ago
Hi, one of the lead authors for this work.

We recommend using Bfloat16 (not fp16), quantization for small models can really hurt performance!

3 comments

Have you compared it to the 1.58 bit dynamic quant model based on the original R1 (i.e., not a distillation)? Whatever unsloth did, it doesn't seem to be giving up much reasoning performance over the full Q8 version.
It's simply bc the model is small (1.5B), making it sensitive to weight perturbations
Is there a GGUF version of your model anywhere that you recommend? I'm on a Mac.
Think there are some people who made GGUFs as branches of our model, try it out!

https://huggingface.co/models?other=base_model:quantized:age...

Is there a MLX version that can be added to the fullmoon iOS app?