|
|
|
|
|
by csomar
511 days ago
|
|
I think because they are trained on Claude/O1, they tend to have comparable performance. The small models quickly fails on complex reasoning. The larger the models, the better the reasoning is. I wonder, however, if you can hit a sweet spot with 100gb of ram. That's enough for most professional to be able to run it on an M4 laptop and will be a death sentence for OpenAI and Anthropic. |
|
Why does having comparable performance indicate having been trained on a preexisting model's output?
I read a similar claim in relation to another model in the past, so I'm just curious how this works technically.