| Have you tried the latest models at best settings? I've been writing software for 20 years. Rust since 10 years. I don't consider myself to be a median coder, but quite above average. Since the last 2 years or so, I've been trying out changes with AI models every couple months or so, and they have been consistently disappointing. Sure, upon edits and many prompts I could get something useful out of it but often I would have spent the same amount of time or more than I would have spent manually coding. So yes, while I love technology, I'd been an LLM skeptic for a long time, and for good reason, the models just hadn't been good. While many of my colleagues used AI, I didn't see the appeal of it. It would take more time and I would still have to think just as much, while it be making so many mistakes everywhere and I would have to constantly ask it to correct things. Now 5 months or so ago, this changed as the models actually figured it out. The February releases of the models sealed things for me. The models are still making mistakes, but their number and severity is lower, and the output would fit the specific coding patterns in that file or area. It wouldn't import a random library but use the one that was already imported. If I asked it to not do something, it would follow (earlier iterations just ignored me, it was frustrating). At least for the software development areas I'm touching (writing databases in Rust), LLMs turned into a genuinely useful tool where I now am able to use the fundamental advantages that the technology offers, i.e. write 500 lines of code in 10 minutes, reducing something that would have taken me two to three days before to half a day (as of course I still need to review it and fix mistakes/wrong choices the tool made). Of course this doesn't mean that I am now 6x faster at all coding tasks, because sometimes I need to figure out the best design or such, but I am talking about Opus 4.6 and Codex 5.3 here, at high+ effort settings, and not about the tab auto completion or the quick edit features of the IDEs, but the agentic feature where the IDE can actually spend some effort into thinking what I, the user, meant with my less specific prompt. |
So you have to burn tokens at the highest available settings to even have a chance of ending up with code that's not completely terrible (and then only in very specific domains), but of course you then have to review it all and fix all the mistakes it made. So where's the gain exactly? The proper goal is for those 500 lines to be almost always truly comparable to what a human would've written, and not turn into an unmaintainable mess. And AI's aren't there yet.