|
|
|
|
|
by zozbot234
490 days ago
|
|
Yes this is just a fine-tuned LLaMa with DeepSeek-like "chain of thought" generation. A properly 'distilled' model is supposed to be trained from scratch to completely mimick the larger model it's being derived from - which is not what's going on here. |
|