Hacker News new | ask | show | jobs
by davidheineman 60 days ago
SWE-bench is fantastic! IMO, the scrutiny is a byproduct of the adoption and success of the benchmark.