| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nielsole 508 days ago
	> We demonstrate that the reasoning patterns of larger models can be distilled into smaller models, resulting in better performance compared to the reasoning patterns discovered through RL on small models I found this statement from the paper to be at odds with what you cited, but I guess they mean SFT+RL would be better than either just SFT and RL

1 comments

rahimnathwani 508 days ago

I think they're saying that some reasoning patterns which large models can learn using only RL (i.e. without the patterns existing in the training data), can't be learned by smaller models in the same way. They have to be 'taught' through examples provided during SFT.

link