Hacker News new | ask | show | jobs
by zozbot234 490 days ago
Yes this is just a fine-tuned LLaMa with DeepSeek-like "chain of thought" generation. A properly 'distilled' model is supposed to be trained from scratch to completely mimick the larger model it's being derived from - which is not what's going on here.
1 comments

I tried the smaller 'Deepseek' models, and to be honest, in my tests, the quality wasn't much different from simply adding a CoT prompt to a vanilla model.