| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by physicsguy 506 days ago
	The interesting part is that distillations based on reinforcement learning based models are performing so well. That brings the cost down dramatically to do certain tasks.

1 comments

azinman2 506 days ago

I thought the distillations were SFT only?

physicsguy 506 days ago

They're SFT on the chain of thought output of R1