| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by throwaway4aday 514 days ago
	That's essentially what R1 Zero is showing: > Notably, it is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT.