Y
Hacker News
new
|
ask
|
show
|
jobs
by
oofbaroomf
400 days ago
Interesting how Sonnet has a higher SWE-bench Verified score than Opus. Maybe says something about scaling laws.
2 comments
somebodythere
400 days ago
My guess is that they did RLVR post-training for SWE tasks, and a smaller model can undergo more RL steps for the same amount of computation.
link
benoittravers
400 days ago
Do you have the link to that benchmark? Can’t see where Sonnet is highlighted.
link