Y
Hacker News
new
|
ask
|
show
|
jobs
by
arrow7000
1191 days ago
Maybe an adversarial approach was used in training these models in the first place?
1 comments
sharemywin
1191 days ago
It was they were' trained using reinforcement learning with human feedback to create the critic.
link
jedberg
1190 days ago
I hadn't thought about human feedback being an adversarial system, but I guess that makes sense, since it's basically a classifier saying "you got this wrong".
link