Hacker News new | ask | show | jobs
by dontreact 509 days ago
Is there any evidence R1 is better than O1?

It seems like if they in fact distilled then what we have found is that you can create a worse copy of the model for ~5m dollars in compute by training on its outputs.