Hacker News new | ask | show | jobs
by skywhopper 807 days ago
Yes, and to be clear, the benchmark used here is merely the 300 simplest problems in the larger benchmark suite, which itself is only a tiny subset of issues from a dozen large (and presumably well-curated) Python projects.

Not to mention that making the code fix is only a tiny part of resolving an issue. There should also be explanations and added test cases. In other words, I doubt the 22% of “fixes” would pass review by the project owner if a human submitted them.