| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by saeranv 814 days ago
	> The idea is that if you can produce an accurate probably distribution over the next bit/byte/token... But how can you get credible probability distributions from the LLMs? My understanding is that the outputs specifically can't be interpreted as a probability distribution, even though superficially they resemble a PMF, due to the way the softmax function tends to predict close to 100% for the predicted token. You can still get an ordered list of most probable tokens (which I think beam search exploits), but they specifically aren't good representations of the output probability distribution since they don't model the variance well.

1 comments

blackle 814 days ago

My understanding is that minimizing perplexity (what LLMs are generally optimized for) is equivalent to finding a good probably distribution over the next token.