You can actually get high-quality code out of them -- at least with Claude; not had a great experience with Gemini -- but for complex tasks requires riding them very, very hard and really understanding where things can go wrong and poking at them repeatedly. Iterate, iterate, iterate.
That describes my last week. What made it most annoying, was the need to release through TestFlight, because the memory issues would not appear, when tethered. Also, I was checking in constantly, because I had to revert and reset the context, several times.
The problem is that the lines of code are riding on a stack of other dependencies that all need care and feeding. Things reach EOL. Frameworks have major breaking changes. CVEs are discovered.
Yes. It's also why working as a software (host) developer at a hardware company is difficult.
Hardware people insist on treating software the same as firmware.
Bad firmware can cause real-world, physical damage, and be impossible to fix without a hardware recall. A firmware bug can wipe out a hardware company. A software bug can be embarassing, but can also be corrected a lot more easily (as long as it is being treated differently from firmware).
> So far, LLMs seem to deliver code with "Louie Da Loan Shark"-levels of tech debt.
Maybe a couple of years ago, but these days, Opus 4.8 is frankly writing better software than what I've seen over the previous decades in non-tech enterprise. These previous two months, we've replaced so much technical debt we've been dragging along for the previous 5 years as our team went from 25 to 3 people.
This is in non-tech enterprise in Denmark and AI had absolutely no impact on us going from 25 to 3. That was all Putin and bad business decisions on the c-levels. Like keeping flexible loans to fund projects on the books when the interests rates were 0.01% because they might go to 0.001%. Anyway, I'm getting to the point where the AI does 100% of the work, but only if it's piloted by people who know what security, resource consumption and compliance is. The code itself is excellent though.
It likely depends on the implementation, and the tool.
In my work, the Swift code is not really that good (but it’s not terrible), but the PHP code is very good (better than mine). I use ChatGPT. Maybe Claude might give better Swift, but I’ve invested quite a bit of context in ChatGPT.
Your experience is apparently different than mine. I went from using our corporate tool to copilot cowork when it became available to us. From opus 4.6 to 4.8 and there has been a massive difference. It's ridiculously good at programming in the right hands, but the right hands is frankly becoming more and more automatable as well, since you can input design documents, compliance policies and allowed packages and it'll do fine.
If you want, you can go through my history and you'll find that I haven't exactly been a fan of AI, but it's silly to deny that it's gotten good.