|
|
|
|
|
by _davide_
18 days ago
|
|
I had a subscription before the price was cut down; the model kept randomly looping the with same character (burning 30% of the budget in one shot), and the overall performance for agentic purposes is, simply put, terrible.
It finds non-existing bugs and randomly removes chunks of code to fix them, then even presents it as an "extra fix".
Maybe it's a good generalistic model; I haven't tested it in that regard. MiniMax (currently 2.7) which is a ~270B model tuned exclusively for agentic purposes, performs so MUCH better; it's more reliable and cheaper. Both are still far away from Opus 4.7 that I'm using at work. IMO benchmarks are just a very rough estimation; everyone cheats as much as they can get away with. Test the model yourself; do not make any assumptions based on the benchmarks. I would love to see specialized, cheaper, bleeding-edge models like MiniMax for other non-agentic purposes as well. Why pay $1 for a general model when, for example, you can pay $0.1 for a content-moderator model that you actually need? |
|