|
|
|
|
|
by ZeroCool2u
102 days ago
|
|
Bit concerning that we see in some cases significantly worse results when enabling thinking. Especially for Math, but also in the browser agent benchmark. Not sure if this is more concerning for the test time compute paradigm or the underlying model itself. Maybe I'm misunderstanding something though? I'm assuming 5.4 and 5.4 Thinking are the same underlying model and that's not just marketing. |
|
It's the one you have access to with the top ~$200 subscription and it's available through the API for a MUCH higher price ($2.5/$15 vs $30/$180 for 5.4 per 1M tokens), but the performance improvement is marginal.
Not sure what it is exactly, I assume it's probably the non-quantized version of the model or something like that.