Hacker News new | ask | show | jobs
by alephxyz 601 days ago
>That score places the IBM SWE agent high up the SWE-bench leaderboard, well above many other agents relying on massive frontier models, like GPT-4o and Claude 3.

They're not even in the top half of the leaderboard. Almost half the score of the first place agent.