|
|
|
|
|
by jamesaross
496 days ago
|
|
I have a Radeon 7900 XTX 24GB and have been using the deepseek-r1:14b for a couple days. It achieves about 45 tokens/s. Only after reading this article did I realize that the 32B model would also fit entirely (23GB used). And since Ollama [0] was already installed, it as as easy as running: ollama run deepseek-r1:32b The 32B model achieves about 25 tokens/s, which is faster than I can read. However, the "thinking" time is mostly a lower quality overhead taking ~1-4 minutes before the Solution/Answer You can view the model performance within ollama using the command: /set verbose [0] https://github.com/ollama/ollama |
|
The good thing of 32B is being as good as 70B at many benchmarks according to Deepseek documentation
https://huggingface.co/deepseek-ai/DeepSeek-R1#distilled-mod...