Y
Hacker News
new
|
ask
|
show
|
jobs
by
zaptrem
423 days ago
I think when they were figuring out RLHF they avoided this by interleaving RLHF and normal cross entropy on training set gradients.