|
|
|
|
|
by antirez
498 days ago
|
|
the S1 paper did the same a few days ago, basically. 1000 total CoT with SFT. I believe that all this shows that pre-training stage already creates the representations needed for CoT reasoning, so they are very simple to uncover. Either with R1-Zero pure RL, or with few-shots SFT. |
|