|
|
|
|
|
by lvwerra
1161 days ago
|
|
One of the authors here :) A note on model performance: indeed, the model is not great (yet) at many of the tasks. We released it mostly as part of a tutorial on RLHF to show case how to do the whole training loop and also because it often creates quite funny answers. There are lots of efforts (internally and externally) to iterate on the approach and build much more capable models and we hoped to speed up the collective learning on how to best do RLHF by releasing a tutorial to setup RLHF training. |
|