Y
Hacker News
new
|
ask
|
show
|
jobs
by
alex_sf
1126 days ago
Taking RLHF into account: it's not actually generating the most plausible completion, it's generating one that's worse.