Hacker News new | ask | show | jobs
by natdempk 497 days ago
I don’t think anyone is really suggesting they stole COT or that it is leaked, but rather that the final o1 outputs were used to train the base model and reasoning components more easily.
1 comments

The RL is done on problems with verifiable answers. I’m not sure how o1 slop would be at all useful in that respect.