|
|
|
|
|
by grammarxcore
398 days ago
|
|
> Many samples have an issue description that is underspecified, leading to ambiguity on what the problem is and how it should be solved. OpenAI apparently tuned _basic discovery and refinement_ out of the tests so I don’t think this is a benchmark of anything useful. It can’t replace a human but can possibly make a human more productive. https://openai.com/index/introducing-swe-bench-verified/ |
|