|
|
|
|
|
by a1j9o94
511 days ago
|
|
Probably not the whole model, but the first step was "fine tuning" the base model on ~800 chain of thought examples. Those were probably from OpenAI models. Then they used reinforcement learning to expand the reasoning capabilities. |
|