pre-training is developing the language model's base understanding of conditional word probabilities.
SFT and RLHF is attempting to further guide the model in terms of steerability + alignment of output.
In fact, the InstructGPT authors were worried about losing the pre-trained model's underlying probability distribution, so they try a version where it penalizes the model deviating too significantly from the original distribution (using KL). I don't remember them seeing a significant difference in performance.
SFT and RLHF is attempting to further guide the model in terms of steerability + alignment of output.
In fact, the InstructGPT authors were worried about losing the pre-trained model's underlying probability distribution, so they try a version where it penalizes the model deviating too significantly from the original distribution (using KL). I don't remember them seeing a significant difference in performance.