|
|
|
|
|
by alephxyz
601 days ago
|
|
>That score places the IBM SWE agent high up the SWE-bench leaderboard, well above many other agents relying on massive frontier models, like GPT-4o and Claude 3. They're not even in the top half of the leaderboard. Almost half the score of the first place agent. |
|