| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rahimnathwani 498 days ago
	That model's weights are around 64GB: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-... GP is likely running the 4-bit quantized version of the finetuned Qwen model.