Hacker News new | ask | show | jobs
by skywhopper 670 days ago
Wait, why wouldn’t RLHF influence word choices?
1 comments

I didn't say it wouldn't (or rather couldn't), I said it was unlikely for the selected hypothesis given standard training data vs RLHF iterations.