| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by randomifcpfan 503 days ago
	In my application, code generation, the distilled DeepSeek models (7B to 70B) perform poorly. They imitate the reasoning of the r1 model, but their conclusions are not correct. The real r1 model is great, better than o1, but the distilled models are not even as good as the base models that they were distilled from.