Hacker News new | ask | show | jobs
by IanCal 719 days ago
> it's just the most probable character after the next.

That's simply not true. You're confusing how they're trained and what they do. They don't have some store of exactly how likely each word is (and it's worth stopping to think about what that would even mean) for every possible sentence.

1 comments

> That's simply not true.

It's a simplification. Temperature also influences it to not always be the most probable character, as an example.

No it's fundamentally not true because when you say "most likely" it's the highest value output of the model, not what's most likely either in the underlying data or the goal of what is being trained for.