Y
Hacker News
new
|
ask
|
show
|
jobs
by
bosco_mcnasty
700 days ago
could the generator and challenger be cross trained against each other, so as to actually both get better? like a generative-challenger network (GCN) or something like this?
1 comments
langcss
697 days ago
I think this is how RLHF actually works.
https://huyenchip.com/2023/05/02/rlhf.html#3_1_reward_model
link