| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by klipt 483 days ago

> LLMs learn from examples where the logits are not probabilities, but how a given sentence continues (only one token is set to 1).

But enough data implies probabilities. Consider 2 sentences:

"For breakfast I had oats"

"For breakfast I had eggs"

Training on this data, how do you complete "For breakfast I had..."?

There is no best deterministic answer. The best answer is a 50/50 probability distribution over "oats" and "eggs"

1 comments

So it is still largely, probabilities pattern matching?

You can model the whole universe with probabilities!