Hacker News new | ask | show | jobs
by anon291 620 days ago
You need to separate out the LLM, which only produces a set of probabilities, from the system, which includes the LLM and the sampling methodology. Sampling is currently not very intelligent at all.

The next bit of confusion is that the 'probability' isn't 'real'. It's not an actual probability but a weight that sums up to one, which is close enough to how probability works that we call it that. However, sometimes there are several good answers and so all the good answers get a lower probability because there are 5 of them. A fixed threshold is not a good idea in this case. Instead, smarter sampling methods are necessary. One possibility is that if we do have seeming confusion, to put a 'confusion marker' into the text and predict the next output and train models to refine the answer as they go along. Not sure if any work has been done here, but this seems to go along with what you're interested in

1 comments

> However, sometimes there are several good answers and so all the good answers get a lower probability because there are 5 of them.

That's the result after softmax. If you want to act on the raw results, you can still do that.

The results before softmax don't sum to one so don't even act like a probability distribution. And that's the point. When you have the pre-softmax activations, there are infinitely many ways to convert them to something probability-like. You can normalize them after taking the square root, the square, raising to three, etc. Or you can exponentiate and for some reason that does better. Either way it's not a 'real' probability distribution.