|
|
|
|
|
by oofbaroomf
400 days ago
|
|
Nice to see that Sonnet performs worse than o3 on AIME but better on SWE-Bench. Often, it's easy to optimize math capabilities with RL but much harder to crack software engineering. Good to see what Anthropic is focusing on. |
|