Hacker News new | ask | show | jobs
by lolinder 562 days ago
> Computability isn't the problem. LLMs are forced to a reply, regardless of the quality of the reply. If "Confidence level is too low for a reply" is an option, the argument in that paper becomes invalid.

This is false. The confidence level of these models does not encode facts, it encodes statistical probabilities that a particular word would be the next one in the training data set. One source of output that is not fit for purpose (i.e. hallucinations) is unfit information in the training data, which is a problem that's intractable given the size of the data required to train a base model.

You can reduce this problem by managing your training data better, but that's not possible to do perfectly, which gets to my point—managing hallucinations is entirely about risk management and reducing probabilities of failure to an acceptable level. It's not decidable, it's only manageable, and that only for applications that are low enough stakes that a 99.9% (or whatever) success rate is acceptable. It's a quality control problem, and one that will always be a battle.

> Alibaba's QwQ [1] supposedly is better at reporting when it doesn't know something. Comments on that?

I've been trying it out, and what it's actually better at is going in circles indefinitely, giving the illusion of careful thought. This can possibly be useful, but it's just as likely to "hallucinate" reasons why its first (correct) response might have been wrong (reasons that make no sense) as it is to correctly correct itself.