Y
Hacker News
new
|
ask
|
show
|
jobs
by
chillacy
64 days ago
If SWE Bench is public then Anthropic is at a minimum probably also looking at their SWE bench scores when making changes, I'd trust more a tracker which runs a private benchmark not known to Anthropic.