| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by skywhopper 807 days ago
	Yes, and to be clear, the benchmark used here is merely the 300 simplest problems in the larger benchmark suite, which itself is only a tiny subset of issues from a dozen large (and presumably well-curated) Python projects. Not to mention that making the code fix is only a tiny part of resolving an issue. There should also be explanations and added test cases. In other words, I doubt the 22% of “fixes” would pass review by the project owner if a human submitted them.