|
|
|
|
|
by ahmed-fathi
106 days ago
|
|
No single paper nails that exact claim. SWE-bench Princeton does show that models struggle significantly with real-world issues requiring changes across multiple files and functions which points in that direction. But the local vs global framing is mostly practitioner-observed, not a formally tested hypothesis yet. Fair point, I should have hedged it. https://arxiv.org/abs/2310.06770 |
|