| I work in DevOps at a firm that has been very enthusiastic about using LLMs (in the good sense). The phases were basically: - try out having the LLM do "a lot" - now even more - now run multiple agents - back to single agents but have the agents build tools - tools that are deterministic AND usable by both the humans (EDIT: and the LLMs) The reasons: 1. Deterministic tools (for both deployments and testing) get you a binary answer and it's repeatable 2. In the event of an outage, you can always fall back to the tool that a human can run 3. It's faster. A quick script can run in <30 seconds but "confabulating" always seemed to take 2-3 minutes. Really, we are back to this article: https://spawn-queue.acm.org/doi/10.1145/3194653.3197520 aka "make a list of tasks, write scripts for each task, combine the scripts into functions, functions become a system" |
In numerous cases, though, there are folks asking it to go interrogate some stuff they've set up MCPs for, and produce reports from it. If you do that it will give you a different answer every time, even from exactly the same input (because that's how LLMs work) and you just can't guarantee that any of them are accurate. It's a probabilistic layer, and the reports you need to generate need to be deterministic.
The problem is we're so accustomed to the deterministic nature of the large majority of the software we work with. The output is plausible, too, which only exasperates the problem. Folks just assume it's correct.