| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zozbot234 490 days ago
	Yes this is just a fine-tuned LLaMa with DeepSeek-like "chain of thought" generation. A properly 'distilled' model is supposed to be trained from scratch to completely mimick the larger model it's being derived from - which is not what's going on here.

1 comments

kgeist 490 days ago

I tried the smaller 'Deepseek' models, and to be honest, in my tests, the quality wasn't much different from simply adding a CoT prompt to a vanilla model.

link