Y
Hacker News
new
|
ask
|
show
|
jobs
by
rahimnathwani
498 days ago
That model's weights are around 64GB:
https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-...
GP is likely running the 4-bit quantized version of the finetuned Qwen model.