|
|
|
|
|
by mynti
206 days ago
|
|
It is interesting that the Gemini 3 beats every other model on these benchmarks, mostly by a wide margin, but not on SWE Bench. Sonnet is still king here and all three look to be basically on the same level. Kind of wild to see them hit such a wall when it comes to agentic coding |
|
It's probably pretty liberating, because you can make a "spikey" intelligence with only one spike to really focus on.