Hacker News new | ask | show | jobs
by rfoo 211 days ago
SWE Bench doesn't even test bugfixing / feature dev properly after you achieve roughly 70% if you don't benchmaxx it .