|
> They choke on anything that isn’t a greenfield project and consistently produce unwanted results. That was true 8 months ago. It's not true today, because of the one-two punch of modern longer-context "reasoning" models (Claude 4+, GPT-5+) and terminal-based coding agents (Claude Code, Codex CLI). Setting those loose an an existing large project is a very different experience from previous LLM tools. I've watched Claude Code use grep to find potential candidates for a change I want to make, then read the related code, follow back the chain of function calls, track down the relevant tests, make a quick detour to fetch the source code of a dependency directly from GitHub (by guessing the URL to the raw file) in order to confirm a detail, make the change, test the change with an ad-hoc "python -c ..." script, add a new automated test, run the tests and declare victory. That's a different class entirely from what GPT-4o was able to do. |
I was decomissioning some code and I made the mistake of asking for an "exhaustive" analysis of the areas I needed to remove. Sonnet 4.5 took 30 minutes looking around and compiling a detailed report on exactly what needed to be removed from this very very brownfield project and after I reviewed the report, it one shot the decommisioning of the code (in this case I was using CLaude in the Cursor tooling at work). It was overkill, but impressive how well it mapped all the ramifications in the code base by greping around.