Y
Hacker News
new
|
ask
|
show
|
jobs
by
natdempk
497 days ago
I don’t think anyone is really suggesting they stole COT or that it is leaked, but rather that the final o1 outputs were used to train the base model and reasoning components more easily.
1 comments
valine
497 days ago
The RL is done on problems with verifiable answers. I’m not sure how o1 slop would be at all useful in that respect.
link