| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tnzk 1289 days ago
	> 6. Release ChatGPT to the public, and use user feedback like response upvotes/downvotes to further optimize the reward model, while continuing to train ChatGPT against the reward model Can someone provide a pointer to an article that elaborate this part?