Y
Hacker News
new
|
ask
|
show
|
jobs
by
bigyabai
197 days ago
RLHF is basically a fancy, overengineered GAN. Most of the industry could see that DPO was more efficient for fitting to human behavior.