Hacker News new | ask | show | jobs
by bosco_mcnasty 700 days ago
could the generator and challenger be cross trained against each other, so as to actually both get better? like a generative-challenger network (GCN) or something like this?
1 comments

I think this is how RLHF actually works. https://huyenchip.com/2023/05/02/rlhf.html#3_1_reward_model