| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by darthrupert 500 days ago
	Wait, what am I running on my 32GB Macbook then? I thought it was the 32b version of deepseek-r1.

3 comments

RandomBK 500 days ago

The only 32B distill I'm aware of is `DeepSeek-R1-Distill-Qwen-32B`, which would be a base model of `Qwen-32B` distilled (further trained) on outputs from the full R1 model.

link

rahimnathwani 498 days ago

That model's weights are around 64GB: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-...

GP is likely running the 4-bit quantized version of the finetuned Qwen model.

link

rahimnathwani 498 days ago

Deepseek R1 has 671 billion parameters. Even if you could quantize each parameter to just 1 bit (from 8 bits), you'd still need 84GB of RAM just for the weights. There is no 32B parameter version of the V3/R1 model architecture.

link

Plankaluel 500 days ago

You are running Qwen2.5 32b that has been fine tuned on data that was generated by R1

link