| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by grammarxcore 398 days ago

> Many samples have an issue description that is underspecified, leading to ambiguity on what the problem is and how it should be solved.

OpenAI apparently tuned _basic discovery and refinement_ out of the tests so I don’t think this is a benchmark of anything useful. It can’t replace a human but can possibly make a human more productive.

https://openai.com/index/introducing-swe-bench-verified/