Hacker News new | ask | show | jobs
by physicsguy 506 days ago
The interesting part is that distillations based on reinforcement learning based models are performing so well. That brings the cost down dramatically to do certain tasks.
1 comments

I thought the distillations were SFT only?
They're SFT on the chain of thought output of R1