Hacker News new | ask | show | jobs
by chillacy 64 days ago
If SWE Bench is public then Anthropic is at a minimum probably also looking at their SWE bench scores when making changes, I'd trust more a tracker which runs a private benchmark not known to Anthropic.