|
|
|
|
|
by agentultra
259 days ago
|
|
I don’t think people are good at self-reporting the “boost” it gives them. We need more empirical evidence. And historically we’re really bad at running such studies and they’re usually incredibly expensive. And the people with the money aren’t interested in engineering. They generally have other motives for allowing FUD and hype about productivity to spread. Personally I don’t see these tools going much further than where they are now. They choke on anything that isn’t a greenfield project and consistently produce unwanted results. I don’t know what magic incantations and combinations of agents people have got set up but if that’s what they call “engineering,” these days I’m not sure that word has any meaning anymore. Maybe these tools will get there one day but don’t go holding your breath. |
|
That was true 8 months ago. It's not true today, because of the one-two punch of modern longer-context "reasoning" models (Claude 4+, GPT-5+) and terminal-based coding agents (Claude Code, Codex CLI).
Setting those loose an an existing large project is a very different experience from previous LLM tools.
I've watched Claude Code use grep to find potential candidates for a change I want to make, then read the related code, follow back the chain of function calls, track down the relevant tests, make a quick detour to fetch the source code of a dependency directly from GitHub (by guessing the URL to the raw file) in order to confirm a detail, make the change, test the change with an ad-hoc "python -c ..." script, add a new automated test, run the tests and declare victory.
That's a different class entirely from what GPT-4o was able to do.