Hacker News new | ask | show | jobs
by lolinder 557 days ago
> how the product is pitched and presents to users.

And this is why I feel it's so important to fix the way we talk about hallucinations. Engineers need to be extremely clear with product owners, salespeople, and other business folks about the inherent limitations of LLMs—about the fact that certain things, like factual accuracy, may asymptotically approach 100% accuracy but will never reach it. About the fact that even getting asymptotically close to 100% is extremely (most likely prohibitively) expensive. And once they've chosen a non-zero failure rate, they have to be clear about what the consequences of the chosen failure rate are.

Before engineers can communicate that to the business side, they have to have that straight in their own heads. Then they can communicate expectations with the business and ensure that they understand that once you've chosen a failure rate, individual 'hallucinations' can't be treated as bugs to troubleshoot—you need instead to have an industrial-style QC process that measures trends and reacts only when your process produces results outside of a set of well-defined tolerances.

(Yes, I'm aware that many organizations are so thoroughly broken that engineering has no influence over what business tells customers. But those businesses are hopeless anyway, and many businesses do listen to their engineers.)

1 comments

> individual 'hallucinations' can't be treated as bugs to troubleshoot

You are wrong here - my company can fix individual responses by adding specific targeted data for the RAG prompt. So a JIRA ticket for a wrong response can be fixed in 2 days.

It's important to understand that you're addressing the problem by adding a layer on top of the core technology, to mitigate or mask how it actually works.

At scale, your solution looks like bolting an expert system on top of the LLM. Which is something that some researchers and companies are actually working on.

Wow, that sounds great: just have every customer who interacts with your LLM come back to the site in 2 days to get the real answer to their question. How can I invest?
I've said before, but I'm not convinced LLM should be public facing. I know some companies have been burned by them and in my opinion, LLM should be about helping customer support people find answers faster.
> LLM should be about helping customer support people find answers faster

That would be as dangerous as any other function: you still need personnel verified as trustworthy in processing unreliable input.

Yes, that is the point. The customer service person would be most able to determine if what the LLM said makes sense or not. My point is, we are sold automation instead of a power tool.