|
|
|
|
|
by Jimmc414
504 days ago
|
|
The big news with DeepSeek-R1 is that it only takes ~800k samples of 'good' RL reasoning to convert other models into RL-reasoners. They successfully distilled the reasoning capabilities from larger models into much smaller ones. e.g.
Their 14B model outperforms other 32B models. |
|