| Figuring out how to trust AI-written code faster is the project of software engineering for the next few years, IMO. We'll need to figure out the techniques and strategies that let us merge AI code sight unseen. Some ideas that have already started floating around: - Include the spec for the change in your PR and only bother reviewing that, on the assumption that the AI faithfully executed it - Lean harder on your deterministic verification: unit tests, full stack tests, linters, formatters, static analysis - Get better ai-based review: greptile and bugbot and half a dozen others - Lean into your observability tooling so that AIs can fix your production bugs so fast they don't even matter. None of these seem fully sufficient right now, but it's such a new problem that I suspect we'll be figuring this out for the next few years at least. Maybe one of these becomes the silver bullet or maybe it's just a bunch of lead bullets. But anyone who's able to ship AI code without human review (and without their codebase collapsing) will run circles around the rest. |
For a non trivial program, 2 implementations of the same natural language spec will have thousands of observable differences.
Where we are today, that is agents require guardrails to keep from spinning out, there is no way to let agents work on code autonomously that won’t end up with all of those observable differences constantly shifting, resulting in unusable software.
Tests can’t prevent this because for a test suite to cover all observable behavior, it would need to be more complex than the code. In which case, it wouldn’t be any easier for machine or human to understand.
The only solution to this problem is that LLMs get better. Personally I think at the point they can pull this off, they can do any white collar job, and there’s not point in planning for that future because it results in either Mad Mad or Star Trek.