| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by RandomBK 501 days ago
	Reminder: DeepSeek distilled models are better thought of as fine-tunes of Qwen/Llama using DeepSeek output, and are not the same as actual DeepSeek v3 or R1. This unfortunate naming has sown plenty of confusion around DeepSeek's quality and resource requirements. Actual DeepSeek v3/R1 continues to require at least ~100GB of VRAM/Mem/SSD, and this does not change that.

2 comments

bestouff 499 days ago

Out of curiosity, would an A100 80GB work for this ?

link

bestouff 499 days ago

Replying to myself: apparently it's not 100GB VRAM but more around 700GB VRAM that's needed to run DeepSeek R1. The gear needed to run that would cost something in the vincinity of 100K€ !

link

RandomBK 499 days ago

Yup. I was referring to the 1.58B quant which seemed to be performing alright and would be the smallest real-DeepSeek model. That requires ~140GB, which is just barely doable on a 128GB RAM + 24GB VRAM setup + a lot of patience. Others have made it work at 64GB RAM + a fast SSD.

The true minimally-quantized DeepSeek experience will need one or possibly two 8xH100 nodes, so well upwards of $100K in CapEx.

link

darthrupert 501 days ago

Wait, what am I running on my 32GB Macbook then? I thought it was the 32b version of deepseek-r1.

link

RandomBK 501 days ago

The only 32B distill I'm aware of is `DeepSeek-R1-Distill-Qwen-32B`, which would be a base model of `Qwen-32B` distilled (further trained) on outputs from the full R1 model.

link

rahimnathwani 499 days ago

That model's weights are around 64GB: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-...

GP is likely running the 4-bit quantized version of the finetuned Qwen model.

link

rahimnathwani 499 days ago

Deepseek R1 has 671 billion parameters. Even if you could quantize each parameter to just 1 bit (from 8 bits), you'd still need 84GB of RAM just for the weights. There is no 32B parameter version of the V3/R1 model architecture.

link

Plankaluel 501 days ago

You are running Qwen2.5 32b that has been fine tuned on data that was generated by R1

link