|
|
|
|
|
by DrPhish
517 days ago
|
|
Making your own ggufs is trivial: https://rentry.org/tldrhowtoquant/edit It's a bit harder when they've provided the safetensors in FP8 like for the DS3 series, but these smaller distilled models appear to be BF16, so the normal convert/quant pipeline should work fine. |
|
Edit: Running the DeepSeek-R1-Distill-Llama-8B-Q8_0 gives me about 3t/s and destroys my system performance on the base m4 mini. Trying the Q4_K_M model next.