Although I write very little code myself anymore, I don't trust AI code at all. My default assumption: every line is the most mid possible implementation, every important architecture constraint violated wantonly. Your typical junior programmer.
So I run specialized compliance agents regularly. I watch the AI code and interrupt frequently to put it back on track. I occasionally write snippets as few-shot examples. Verification without reading every line, but not "vibe checking" either.
I like this. The few-shot example snippet method is something I’d like to incorporate in my workflow, to better align generated code with my preferences.
I have written a research paper on another interesting prompting technique that I call axiomatic prompting. On objectively measurable tasks, when an AI scores below 70%, including clear axioms in the prompt systematically increases success.
In coding this would convert to: when trying to impose a pattern or architecture that is different enough from the "mid" programming approach that the AI is compelled to use, including axioms about the approach (in a IF this THEN than style, as opposed to few shot examples) will improve success.
The key is the 70% threshold: if the model already has enough training data, axioms hurt. If the model is underperforming because the training set did -not- have enough examples (for example hyperscript), axioms helps.
Compilers have a finite set of inputs and outputs that should generate reproducible results. There's a larger amount of possible outputs for the same question with AI and very little reproducbility.
Yes but once the code is written it’s not going to magically change. I am going to test the code just like I would test something I wrote - again like I’ve been doing for 40 years when writing my code by hand.
But your thought process during coding influences your testing. At least for most of us, we find edge cases or point of concern during coding that we place extra focus on in test.
This is different than what you've done for the past 40 years becuase you're not testing your code. This would be analogous to you testing someone else's code. The vast majority of people and places have not followed that paradigm until AI showed up.
My thought process during my architecture influences my testing.
Since AI has been a thing, I’ve been in a customer facing cloud consulting role - working full time at consulting departments (AWS ProServe) and now a third party company - specializing in app dev.
Before my hands actually write a line of code or infrastructure as code, I’ve already spoken to sales to get a high level idea of what the customer wants, read over the contract (SoW) to see what questions I have, done discovery sessions/requirements analysis, created architecture diagrams, done a design review, created detailed stories/workstreams (epics), thought about all the way things can go wrong etc.
I very much keep my hands on the wheel and treat AI as a junior coder that might not follow my instructions. I can answer any question about architectural decisions, repo structure, what any Lambda does the naming conventions etc.
I’ve also intuited “these are the things that I need to think about and test for from my 30 years of professional experience as a developer and 8 years of experience across literally dozens of AWS implementations”.
In the before times, if I were doing this without AI, I would have to have two or three more junior people doing the work just because I couldn’t physically do it in 40 hours a week. Even then I would be focused on how it works and look for corner cases.
I don’t have to think about what I need to test for. I did specifically call out concurrency because there are subtle bugs.
Ironically, what I am working on now had a subtle concurrent locking bug that Codex wrote. I threw the code into ChatGPT thinking mode and it found it immediately and suggested better alternatives. I also have Claude and Codex cross check each other.
"I don’t have to think about what I need to test for."
Good luck then. The business process flow including edge cases should arguably be top of mind for what to test. Testing shouldn't be an afterthought but rather an integral thought when writing the code that needs to be tested.
"I would have to have two or three more junior people doing the work"
Yeah, and they're the ones thinking about testing the code they write. Architects (which it sounds like you are an architect and not a dev) don't get into thay much detail.
I input x and I expect y behavior and check for corner cases - just like I have checked for correctness for 40 years. Why do I care how the code was generated as long as it has the correct behavior?
Of course multithreaded code is the exception unless the LLM is putting a bunch of rnd() calls in the code to make it behave differently.
So I run specialized compliance agents regularly. I watch the AI code and interrupt frequently to put it back on track. I occasionally write snippets as few-shot examples. Verification without reading every line, but not "vibe checking" either.