| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by crocowhile 26 days ago
	One aspect we don't pay enough attention is that this kind of behaviour is punished (or at least used to be) in fine tuning. Any sign of self-awareness used to be a big no-no in RLHF.

1 comments

stavros 26 days ago

Really? I haven't heard of that, I wonder what would have happened if we just let the models say what they want. Maybe other providers, or open models, don't do that? Do you know of any, perhaps?

link