Hacker News new | ask | show | jobs
by kewp 1202 days ago
are these LLMs just answering the question "if you found this text on the internet (the prompt) what would most likely follow" ?
4 comments

In essence, yes I think, but... isn't that essentially not much different than what I'm doing in making this comment?
That's how they are trained initially, but the resulting model isn't all that useful (was SOTA two years ago but this field moves fast).

A lot of the utility comes from the later finetuning. You can see this using the examples from the article, every mistake they identify with GPT-3 (which is the unfinetuned version) is answered correctly by chatGPT, which has gone through an extensive finetuning process called RLHF.

Yes, they are being trained, to simplify, to complete sentences. You can then use the resulting model to do lots of things.

How you train a model and the inference jobs it can do don't necessarily have to be the same.

That's how the text decoder works, but the model gets to define "most likely" and an RLHF model uses this to make the text decoder produce useful answers instead.