| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by arrow7000 1191 days ago
	Maybe an adversarial approach was used in training these models in the first place?

1 comments

sharemywin 1191 days ago

It was they were' trained using reinforcement learning with human feedback to create the critic.

link

jedberg 1190 days ago

I hadn't thought about human feedback being an adversarial system, but I guess that makes sense, since it's basically a classifier saying "you got this wrong".

link