Hacker News new | ask | show | jobs
by wg0 950 days ago
Problem remains if the model "hallucinates" and makes information up. For example, it says candidate is a Lisp expert and later, turns out not.

If even it happens some 10 percent of the time, it'll get a reputation for being unreliable and would lead to "oh, why waste time, I better ask the candidate directly" situations.

But other than that, it's a very creative idea. Really. 10 out of 10.

EDIT: Idea is great nevertheless.

3 comments

Yeah that was REALLY finicky to get right. I have a few ways to prevent hallucinations and I haven't gotten any in the latest iteration even with really crazy questions being asked. (I encourage you to try to break it).

This is partly why I show the source Snippets (Q/A Pairs written directly by the owner) below the summary as a way to verify the information. Kind of like the AI 'showing it's work. It also let's see more about the owner which is a nice side benefit, or maybe the main benefit.

I can also turn off the AI summary part and leave the AI search part. If this becomes a bigger thing, I might give users a way to enable/disable the potentially hallucinogenic part, but it'll be their choice.

Ack on your edit. Thanks, it means a lot.

I think there's a lot of work to do to work out the kinks with things like hallucinations, but I think we forget sometimes so much of what we do is statistical in nature already. Your car has a statistical chance of breaking down within x timeframe and while you're on the highway, it's just, low enough that you don't worry about it. I think AI will be a similar thing where we have to get comfortable with how we evaluate and mitigate risks, but like many things, they'll never be 0.

The idea of the tool you're using "hallucinating" is quite asinine. It makes me question why anyone is building products using ML systems that can do that. Like, why put a rudder on your plane that very rarely just decides to fly you into the ground?
For the precise reason, I would never put LLMs in my product (if I have one) be it financial (tell me what two products are doing good together in winter seasons near XYZ location) or even user facing (how can I turn off photo sharing so that only I see what I take with my camera) or something similar as trusting LLMs might lead to serious troubles if the output is wrong.

Just dry run through the scenarios and assume LLM's output is wrong even if 10% of the time.

The LLMs were never designed to be fact checkers, they're text generators, not fact generators.

Additionally you don't need ML to get your computer to show you confidently something that is completely wrong, all it takes is to multiply specific floating point numbers repeatedly, really.