|
|
|
|
|
by rahimnathwani
505 days ago
|
|
I think they're saying that some reasoning patterns which large models can learn using only RL (i.e. without the patterns existing in the training data), can't be learned by smaller models in the same way. They have to be 'taught' through examples provided during SFT. |
|