| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mountainriver 449 days ago
	This is exactly it, it’s the result of RLVR, where we force the model to reason about how to get to an answer when that information isn’t in its base training.