|
|
|
|
|
by munksbeer
1 day ago
|
|
> Are you seeing a quality increase? Less customer bugs, less outages, faster resolution? Are you measuring those? We're not at the stage to measure yet. We may be behind others, not sure. Actually, this isn't quite true. I was interested, so a created an ad-hoc report (with AI) on PRs landed per week over time. This has gone up over the last 6 momths. But that is hard to say why that is. It might just be people are raising smaller PRs because it becomes easy to have the AI split things up, while before, people were too lazy to do this. Our bottleneck is still that we want humans to review. Sometimes we spot errors, but our pre-existing testing frameworks are very robust already, so if these pass, we're very confident to release to production, and the agent is excellent at understanding the existing testing frameworks and adding to them for new stuff. So in our team, we don't often see blatant logic errors. It is mostly to do with things like using a pattern that is used elsewhere in the codebase (or not at all) and doesn't belong in our specific section of the code (we have a large monorepo). These become fewer as we enhance our ruleset (AGENTS.md or CLAUDE.md) for our particular developers. |
|
So how can you justify this comment of yours from your reply if you’re not measuring anything? Mind you, I can easily get good results from AI tools, but I don’t like the experience and the code is often over-engineered and drifts away from my target architecture.
But the worst is quickly loosing sight of the tiny technical details that matters when solving bugs or altering features. I don’t like typing code. What I like is to be able to go directly to the code that I need to change, modify it, and then verify that it works. Most of my time is spent deep thinking about the design of the software which is orthogonal to code.
And if there is one thing that is common about people fully onboard with LLM is that they can talk about the product, but they can’t argue about its behavior and its correctness. There’s no intrinsic model that they can compare with the real code. They don’t know the edge cases, the technical pitfalls, how the software will react if you modify one component. Any brainstorming session quickly turns into a slog because they cannot contrast approaches anymore. You can see the decay of understanding in realtime.