|
|
|
|
|
by craigacp
858 days ago
|
|
There's usually a two or three step training procedure, first training to predict the next word on a huge corpus of text (billions or trillions of words), then possibly some instruction tuning (giving the model question & answer pairs and training on the answer) and then finally RLHF (or RLAIF, DPO etc) where the model is trained to match human preferences. It's this last step that is used to increase the helpfulness & harmlessness of the model, training it to not respond to certain topics. |
|