Hacker News new | ask | show | jobs
by lvwerra 1161 days ago
One of the authors here :) A note on model performance: indeed, the model is not great (yet) at many of the tasks. We released it mostly as part of a tutorial on RLHF to show case how to do the whole training loop and also because it often creates quite funny answers.

There are lots of efforts (internally and externally) to iterate on the approach and build much more capable models and we hoped to speed up the collective learning on how to best do RLHF by releasing a tutorial to setup RLHF training.

1 comments

Model capability is mostly set, before the alignment even starts. Alignment turns it from a super-smart cat into a friendly dog. But it can't turn a parrot into a human. It can't even teach the parrot to count ;)