Hacker News new | ask | show | jobs
by Jimmc414 504 days ago
The big news with DeepSeek-R1 is that it only takes ~800k samples of 'good' RL reasoning to convert other models into RL-reasoners.

They successfully distilled the reasoning capabilities from larger models into much smaller ones. e.g. Their 14B model outperforms other 32B models.