Hacker News new | ask | show | jobs
by antirez 498 days ago
the S1 paper did the same a few days ago, basically. 1000 total CoT with SFT.

I believe that all this shows that pre-training stage already creates the representations needed for CoT reasoning, so they are very simple to uncover. Either with R1-Zero pure RL, or with few-shots SFT.