| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lvwerra 1208 days ago
	One of the authors here :) A note on model performance: indeed, the model is not great (yet) at many of the tasks. We released it mostly as part of a tutorial on RLHF to show case how to do the whole training loop and also because it often creates quite funny answers. There are lots of efforts (internally and externally) to iterate on the approach and build much more capable models and we hoped to speed up the collective learning on how to best do RLHF by releasing a tutorial to setup RLHF training.

1 comments

mcaledonensis 1207 days ago

Model capability is mostly set, before the alignment even starts. Alignment turns it from a super-smart cat into a friendly dog. But it can't turn a parrot into a human. It can't even teach the parrot to count ;)

link