Y
Hacker News
new
|
ask
|
show
|
jobs
by
physicsguy
506 days ago
The interesting part is that distillations based on reinforcement learning based models are performing so well. That brings the cost down dramatically to do certain tasks.
1 comments
azinman2
506 days ago
I thought the distillations were SFT only?
link
physicsguy
506 days ago
They're SFT on the chain of thought output of R1
link