Hacker News new | ask | show | jobs
by alex_sf 1126 days ago
Taking RLHF into account: it's not actually generating the most plausible completion, it's generating one that's worse.