|
|
|
|
|
by wilhelm____
1056 days ago
|
|
pre-training is developing the language model's base understanding of conditional word probabilities. SFT and RLHF is attempting to further guide the model in terms of steerability + alignment of output. In fact, the InstructGPT authors were worried about losing the pre-trained model's underlying probability distribution, so they try a version where it penalizes the model deviating too significantly from the original distribution (using KL). I don't remember them seeing a significant difference in performance. |
|