|
|
|
|
|
by anon373839
39 days ago
|
|
The model outputs a probability distribution for the next token, given the sequence of all previous tokens in the context window. It’s just a list of floats in the same order as the list of tokens that the tokenizer uses. After that, a piece of software that is NOT the LLM chooses the next token. This is called the sampler. There are different sampling parameters and strategies available, but if you want repeatable* outputs, just take the token with the highest probability number. * Perfect determinism in this sense is difficult to achieve because GPU calculations naturally have a minor bit of nondeterminism. But you can get very close. |
|