Hacker News new | ask | show | jobs
by ndand 451 days ago
I understand it differently,

LLMs predict distributions, not specific tokens. Then an algorithm, like beam search, is used to select the tokens.

So, the LLM predicts somethings like, 1. ["a", "an", ...] 2. ["astronomer", "cosmologist", ...],

where "an astronomer" is selected as the most likely result.

2 comments

Just to be clear, the probability for "An" is high, just based on the prefix. You don't need to do beam search.
They almost certainly only do greedy sampling. Beam search would be a lot more expensive; also I'm personally skeptical about using a complicated search algorithm for inference when the model was trained for a simple one, but maybe it's fine?