Hacker News new | ask | show | jobs
by mlyle 483 days ago
I can imagine how it could happen by mistake. Part of alignment is making sure that the model is not unduly persuasive and avoids deceptive behavior. But there's plenty of use cases for the opposite of this: persuasive AI that deceives/hides shortcomings.

It's interesting that attempting to reverse one characteristic of alignment could reverse others.