I suspect is very hard to get a consultant that doesn't hand over a repo of AI generated nonsense right now. For the reason above, they are (usually) under time pressure to deliver, and they can wash their hands of the results at the end of the contract.
I am at 2 for 2 at the moment in the "infrastructure as code" arena (I wasn't involved with choosing to use a consultant, just dealing with the output). Which is an area that AI was supposed to eat for lunch. And it seems like it should, DSLs with a narrower scope seem perfect for an LLM, but I'm not convinced.
I think the issue is, infrastructure DSLs like Terraform or Azure Bicep are distilling down an architecture that has complex interactions and often needs a lot of "inside baseball" knowledge from outside of the code to create a congruent result. Unless you feed a bible of markdown files to the LLM to guide it in the right direction the output goes off the rails fast. The time spend creating the bible might as well be spent creating the code.
Of course there are areas where an LLM will definitely help, like re-factoring, stamping out boiler-plate or even building on a solid base. But attempt to create even a semi-complex architecture from scratch using a few paragraphs of prompt and you are asking for trouble.
The trouble with the consultants I have interacted with is they don't write the bible first, as far as i can tell they just iterate on slop and you end up with multiple 800+ line PowerShell scripts in IaC pipelines and other craziness that is almost impossible to unpick after they have gone.
At a granular level, it's almost guaranteed that you cannot write better code than an agent.
Agents now are writing extremely consistent, normalized canonical code, that usually compiles the first time.
Right out of the 'textbook'.
For what it's trying to do - it writes nearly perfect code.
The only thing you could nominally disagree with are some of the conventions and idioms.
It 'writes a perfect novel, in perfect prose'.
What it will not do however, is 'write the novel that's in your head'.
And that's the crux of it.
It's not even your job to 'write code' at this point, but rather to be the storyteller - and a very good editor who has enough taste and grasp of gammar to be able to know when it's going awry.
It will make mostly what you tell it too, the quality of the output is the quality of your guidance, but at the lowest levels it's generating extremely high quality syntactic prose.
I don't think LLMs inherently do anything perfectly. They can make sure it compiles and passes tests and they can be trained to do an enormous array of tasks, but the code it generates isn't perfect, it's selecting one of many possible outputs based off of some numbers it came up with after a few matrix multiplications and ReLU activations.
Those matrix multiplications aren't a divine perfect thing. They suffer from floating point precision issues and training data issues and there's still debate if adversarial examples are just an unsolveable property of our linear-algebra based neural network architecture.
Can they do things way faster than a human? No doubt. Can they do very complex tasks? Yes. Do they do things with perfection? Not by our human definition of perfect.
"Those matrix multiplications aren't a divine perfect thing. They suffer from floating point precision issues " - this is not the right intuition.
"Not by our human definition of perfect."?
'Human definition' has nothing to do with it.
Your job is to define what you want, to the extent you can do that, the AI does really well at a certain scale, at the 'functional' scale, nearly perfectly.
AI can already easily emulate human creative output [1][2] and there is zero change that you or anyone else can reliably detect AI output with any degree of consistency.
Even with the few 'tell tale' patterns it's been leaving ... that threshold is being moved past quite quickly. Within not even 6 months, works will be identical for all bus some specific activities.
Holding LLMs to the standards of human contextual understanding without communicating sufficient context can be dispensed with, as fantasy.
Using LLMs you quickly learn how much can be inferred from existing, which is dynamic and particular to each model today. There will always be a gap in how much instruction is needed due to mismatches of existing versus intent. Current state, you don't need much to get a lot out that can be verified prior to merge.
LLMs + Harnesses are incredibly effective, as evidenced by the literal millions of people who are paying quite a lot to use them, who speak glowingly of them and would 'never go back'.
Whatever 'shape they take' - they are obviously useful - ergo - 'you're doing something wrong' if you can't make use of them for most tasks.
Mileage varies, there are downsides, but it's the same with anything.
I think you have just written the epitaph for corporate software.