| > I see constant scepticism and doubt that LLMs can build anything useful, and whenever provided with examples, the goalposts just move. > I see people telling us that they cannot write decent production code, and this is just wrong. At least for me, that has never been the counterpoint that I’ve been making. I’ve never cared about code itself, especially with languages like Java and Kotlin, where you basically autocomplete most of the code, and with SDK like ios where you can collect snippets for most of the patterns that you need. And with frameworks like Laravel, where most big additions are done with the tooling. And because code is so repetitive, editors like emacs and vim have lots of features and plugins to help with copying and pasting (registers, macros, navigation, snippets,…) And the fact is some code you wrote today will be worthless tomorrow and will be replaced and deleted. So, it’s very rare to care about some particular snippets or patch of code. What myself, and others, have been complaining about is the quality of the codebase and the sustainability of the practice. Especially with the associated claims about increased productivity. I care about correctness. Simplicity and reduced amount of code increase my confidence that I can achieve it. New features, until tested in production, are more probable to decrease the reliability of the software. And with each fix for a bug, I need to make sure that I’m not adding five more. To this day, I’ve not seen any compelling arguments that is about writing better code reliably. I’ve seen a lot about writing more code. It’s like manager thinking if you’re not at your computer typing, you’re not working. > We're still sceptical enough that we are doing the usual heavy handed human review process, so we're not seeing a huge speed up in delivery times, but we are seeing a volume increase Are you seeing a quality increase? Less customer bugs, less outages, faster resolution? Are you measuring those? |
We're not at the stage to measure yet. We may be behind others, not sure. Actually, this isn't quite true. I was interested, so a created an ad-hoc report (with AI) on PRs landed per week over time. This has gone up over the last 6 momths. But that is hard to say why that is. It might just be people are raising smaller PRs because it becomes easy to have the AI split things up, while before, people were too lazy to do this.
Our bottleneck is still that we want humans to review. Sometimes we spot errors, but our pre-existing testing frameworks are very robust already, so if these pass, we're very confident to release to production, and the agent is excellent at understanding the existing testing frameworks and adding to them for new stuff.
So in our team, we don't often see blatant logic errors. It is mostly to do with things like using a pattern that is used elsewhere in the codebase (or not at all) and doesn't belong in our specific section of the code (we have a large monorepo). These become fewer as we enhance our ruleset (AGENTS.md or CLAUDE.md) for our particular developers.