| > While the hallucination problem in LLMs is inevitable [0], they can be significantly reduced... Every article on hallucinations needs to start with this fact until we've hammered that into every "AI Engineer"'s head. Hallucinations are not a bug—they're not a different mode of operation, they're not a logic error. They're not even really a distinct kind of output. What they are is a value judgement we assign to the output of an LLM program. A "hallucination" is just output from an LLM-based workflow that is not fit for purpose. This means that all techniques for managing hallucinations (such as the ones described in TFA, which are good) are better understood as techniques for constraining and validating the probabilistic output of an LLM to ensure fitness for purpose—it's a process of quality control, and it should be approached as such. The trouble is that we software engineers have spent so long working in an artificially deterministic world that we're not used to designing and evaluating probabilistic quality control systems for computer output. [0] They link to this paper: https://arxiv.org/pdf/2401.11817 |
I think that's a mischaracterization and not really accurate. As a trade, we're familiar with probabilistic/non-deterministic components and how to approach them.
You were closer when you used quotes around "AI Engineer" -- many of the loudest people involved in generative AI right now have little to no grounding in engineering at all. They aren't used to looking at their work through "fit for purpose" concerns, compromises, efficiency, limits, constraints, etc -- whether that work uses AI or not.
The rest of us are variously either working quietly, getting drowned out, or patiently waiting for our respected colleagues-in-engineering to document, demonstrate, and mature these very promising tools for us.
Everything else you said is 100% right, though.