| HN Mirror

In general, the core language model is simply trained on a very large amount of unannotated text (which is the most time-consuming and expensive part), but a language model is not directly very useful in the role of e.g. a chat agent, it quite literally tries to continue text and that sometimes is what you want and sometimes isn't.

The second step is fine-tuning the model on a much smaller set of annotated data which specify that it should actually "do something" in its responses and what it should do; it "teaches" it that it should actually answer the questions instead of e.g. continuing on with a list of more questions in the same vein, and most such training sets also "teach" it that for certain questions the appropriate response is a refusal.

If you have the original core model (before that instruction tuning) then you can repeat the same process but instead replace the instruction training set with a different one, so you can "instruct" the model to behave differently. Here is a nice and informative article from Eric Hartford about how he did that to make certain 'uncensored' models - https://erichartford.com/uncensored-models