Hacker News new | ask | show | jobs
by ahmed-fathi 106 days ago
No single paper nails that exact claim. SWE-bench Princeton does show that models struggle significantly with real-world issues requiring changes across multiple files and functions which points in that direction. But the local vs global framing is mostly practitioner-observed, not a formally tested hypothesis yet. Fair point, I should have hedged it. https://arxiv.org/abs/2310.06770