| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by fpgaminer 446 days ago
	Supervised finetuning is only a seed for RL, nothing more. Models that receive supervised finetuning before RL perform better than those that don't, but it is not strictly speaking necessary. Crucially, SFT does not improve the model's reliability.

1 comments

anon373839 445 days ago

I think you’re referring to the Deepseek-R1 branch of reasoning models, where a small amount of SFT reasoning traces is used as a seed. But for non-“reasoning” models, SFT is very important and definitely imparts enhanced capabilities and reliability.

link