|
|
|
|
|
by klipt
437 days ago
|
|
> LLMs learn from examples where the logits are not probabilities, but how a given sentence continues (only one token is set to 1). But enough data implies probabilities. Consider 2 sentences: "For breakfast I had oats" "For breakfast I had eggs" Training on this data, how do you complete "For breakfast I had..."? There is no best deterministic answer. The best answer is a 50/50 probability distribution over "oats" and "eggs" |
|