| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by antirez 498 days ago
	the S1 paper did the same a few days ago, basically. 1000 total CoT with SFT. I believe that all this shows that pre-training stage already creates the representations needed for CoT reasoning, so they are very simple to uncover. Either with R1-Zero pure RL, or with few-shots SFT.