Hacker News new | ask | show | jobs
by sharemywin 1190 days ago
It was they were' trained using reinforcement learning with human feedback to create the critic.
1 comments

I hadn't thought about human feedback being an adversarial system, but I guess that makes sense, since it's basically a classifier saying "you got this wrong".