| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by andoando 106 days ago
	The thinking models are additionally trained with reinforcement learning to produce chain of thought reasoning