|
|
|
|
|
by fpgaminer
446 days ago
|
|
Supervised finetuning is only a seed for RL, nothing more. Models that receive supervised finetuning before RL perform better than those that don't, but it is not strictly speaking necessary. Crucially, SFT does not improve the model's reliability. |
|