Hacker News new | ask | show | jobs
by astrange 1204 days ago
That's how the text decoder works, but the model gets to define "most likely" and an RLHF model uses this to make the text decoder produce useful answers instead.