|
|
|
|
|
by armcat
490 days ago
|
|
Yes, the authors explicitly highlighted those two points in the abstract, in terms of them being the elicitation threshold for complex reasoning, namely, an extremely complete pre-trained foundation model, and a set of extremely high quality examples post-training. To your question on finetuning on the initial 10 million pool - intuitively, it would require tremendous amount of finetuning data to move the needle - you really won't be able to move the gradients much with just 817 examples, that initial pool is effectively enforcing pretty rigid regularization. There is now an increasing interest in showing that small data with inference time scaling is providing significant yield. Couple of recent examples: * TinyZero: https://github.com/Jiayi-Pan/TinyZero
* s1 Simple Test Time Scaling: https://arxiv.org/abs/2501.19393 |
|