| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by alephxyz 601 days ago
	>That score places the IBM SWE agent high up the SWE-bench leaderboard, well above many other agents relying on massive frontier models, like GPT-4o and Claude 3. They're not even in the top half of the leaderboard. Almost half the score of the first place agent.