|
|
|
|
|
by rdsubhas
6 hours ago
|
|
IMHO, It's not the oneshotting. It's the "starting from empty slate" greenfield that's the real problem. We used to make fun of Engineers who follow a README on a framework, test it on an empty project, and say "this framework is the best for our 10 year running production app". Greenfield mentality is always the solution to all problems and problem to all solutions. One should still measure oneshotting, it's an important self-measurement metric - but against an established, large codebase. |
|
* SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios https://arxiv.org/abs/2512.18470 * SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration https://arxiv.org/abs/2603.03823