| LLMs _will_ hallucinate no matter what you do. I think you're overthinking it. This is like saying LLMs will never be useful because they hallucinate. That's a known issue, and yet of course they have been proven to be often quite useful nonetheless. What it comes down to is, how often do they hallucinate, what's the negative impact when they do (both of which can be measured) and very importantly: for whatever their measured performance is, how does it compare to the next best alternative that users have? It's not like they're trying to build a model to design a nuclear reactor in one go. It's just a Q+A bot, whose performance can be easily measured by benchmarking it against the top 30 questions or so in a given subject area (probably accounting for 95 percent of all inputs). And the current alternative users have (search engines) is pretty darn mediocre. BTW I'm actually not much of a fan of LLMs or chatbots, so I have nothing to "sell" you here. But this is my rough take, based on my generally quite skeptical attitude toward this technology. Which does seem to suggest that, at the very least, it's an idea worth exploring. |
The problem is not how often but how bad just one single error can be. My point was on controversial topic, where a single error can deal serious damage. Yes it must be error-free like for a nuclear reactor. Just imagine a Q&A chatbot answering questions on the subject of Israel and Palestine or something else really touchy, do you really think you can afford any error/hallucination ?
[0] : https://arxiv.org/abs/2409.05746 [1] : https://arxiv.org/abs/2401.11817