Hacker News new | ask | show | jobs
by tnzk 1289 days ago
> 6. Release ChatGPT to the public, and use user feedback like response upvotes/downvotes to further optimize the reward model, while continuing to train ChatGPT against the reward model

Can someone provide a pointer to an article that elaborate this part?