|
|
|
|
|
by tnzk
1289 days ago
|
|
> 6. Release ChatGPT to the public, and use user feedback like response upvotes/downvotes to further optimize the reward model, while continuing to train ChatGPT against the reward model Can someone provide a pointer to an article that elaborate this part? |
|