| From the R1 paper In this study, we demonstrate that reasoning capabilities can be significantly
improved through large-scale reinforcement learning (RL), even without using supervised
fine-tuning (SFT) as a cold start. Furthermore, performance can be further enhanced with
the inclusion of a small amount of cold-start data Is this cold start data what OpenAI is claiming their output ? If so what's the big deal ? |