| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nlpnerd 507 days ago
	That is a slight exaggeration, extrapolation on the author's part. What happened was that RL training led to some emergent behavior in R1-Zero (chain-of-thought, and reflection) without being prompted or trained for explicitly. Don't see what is so domain specific about that though.