|
|
|
|
|
by simsla
96 days ago
|
|
Typical stages of training for these models are: Foundational: - Pretraining
- Mid/post-training (SFT)
- RLHF or alignment post-training (RL) And sometimes... - Some more customer-specific fine-tuning. Note that any supervised fine-tuning following the Pretraining stage is just swapping the dataset and maybe tweaking some of the optimiser settings. Presumably they're talking about this kind of pre-RL fine-tuning instead of post-RL fine-tuning, and not about swapping out the Pretraining stage entirely. |
|