Hacker News new | ask | show | jobs
by piperswe 390 days ago
How much of that is because the models are optimizing specifically for SWE bench?
1 comments

not that much because its getting better at all benchmarks