| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by scribu 449 days ago
	If the base models already have the “reasoning” capability, as they claim, then it’s not surprising that they were able to get to SOTA using a relatively negligible amount of compute for RL fine-tuning. I love this sort of “anti-hype” research. We need more of it.