| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jamesaross 543 days ago

I have a Radeon 7900 XTX 24GB and have been using the deepseek-r1:14b for a couple days. It achieves about 45 tokens/s. Only after reading this article did I realize that the 32B model would also fit entirely (23GB used). And since Ollama [0] was already installed, it as as easy as running: ollama run deepseek-r1:32b

The 32B model achieves about 25 tokens/s, which is faster than I can read. However, the "thinking" time is mostly a lower quality overhead taking ~1-4 minutes before the Solution/Answer

You can view the model performance within ollama using the command: /set verbose

[0] https://github.com/ollama/ollama

2 comments

waltercool 543 days ago

Yup, this is what deepseek does.

The good thing of 32B is being as good as 70B at many benchmarks according to Deepseek documentation

https://huggingface.co/deepseek-ai/DeepSeek-R1#distilled-mod...

link

stonecharioteer 542 days ago

I've been running 32b as well.

But I cannot find it in LM Studio, what am I doing wrong that I only find distilled models?

link

nickthegreek 542 days ago

32b is distilled model. Only the 670b is not.

link