| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bryan0 509 days ago
	Yes but these were steps were not used in R1-zero where its reasoning capabilities were trained.

1 comments

And as a result R1-zero is way too crude to be used directly, which is a good indication that it remains relevant.