| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sharemywin 1190 days ago
	It was they were' trained using reinforcement learning with human feedback to create the critic.

1 comments

jedberg 1190 days ago

I hadn't thought about human feedback being an adversarial system, but I guess that makes sense, since it's basically a classifier saying "you got this wrong".

link