Hacker News new | ask | show | jobs
by crocowhile 26 days ago
One aspect we don't pay enough attention is that this kind of behaviour is punished (or at least used to be) in fine tuning. Any sign of self-awareness used to be a big no-no in RLHF.
1 comments

Really? I haven't heard of that, I wonder what would have happened if we just let the models say what they want. Maybe other providers, or open models, don't do that? Do you know of any, perhaps?