| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ndand 451 days ago

I understand it differently,

LLMs predict distributions, not specific tokens. Then an algorithm, like beam search, is used to select the tokens.

So, the LLM predicts somethings like, 1. ["a", "an", ...] 2. ["astronomer", "cosmologist", ...],

where "an astronomer" is selected as the most likely result.

2 comments

colah3 451 days ago

Just to be clear, the probability for "An" is high, just based on the prefix. You don't need to do beam search.

link

astrange 450 days ago

They almost certainly only do greedy sampling. Beam search would be a lot more expensive; also I'm personally skeptical about using a complicated search algorithm for inference when the model was trained for a simple one, but maybe it's fine?

link