| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Jimmc414 504 days ago
	The big news with DeepSeek-R1 is that it only takes ~800k samples of 'good' RL reasoning to convert other models into RL-reasoners. They successfully distilled the reasoning capabilities from larger models into much smaller ones. e.g. Their 14B model outperforms other 32B models.