Hacker News new | ask | show | jobs
by sudohackthenews 474 days ago
> > smaller distilled version

Not technically the full R1 model, it’s talking about the distillations where Deepseek trained Qwen and Llama models based on R1 output

1 comments

Then how about DeepSeek R1 GGUF:

> Yes, you can run this model! Your system has sufficient resources (16GB RAM, 12GB VRAM) to run this model.

No mention of distillations. This was definitely either made by AI, or someone picking numbers for the models totally at random.

Ok yeah that’s just weird