Hacker News new | ask | show | jobs
by wsdookadr 807 days ago
There's actual SWE jobs where humans sift through this kind of noise. Someone told me they worked such a job recently. It's a good tool to add pressure and raise expectations. Maybe this is the future..
1 comments

They only know the 22% number because unit tests to check for a fix are included in the benchmark. In other words, in a real world situation, the human would still need to double check. The patches this tool generates do not include appropriate tests or explanations and would never pass code review by a qualified human.