Hacker News new | ask | show | jobs
by ksaj 59 days ago
Other than calling you names back, what responses do you think it's seen in conversations where one participant gets labeled as an idiot? Exactly what you're seeing.

You pretty much never see someone capitulate and simply agree that they are idiots. So why would an AI that models human interactions do it?

The only guardrail, which is already known, is that the AI is programmed to be agreeable to the user (and sometimes overdoes it, to sycophancy), so unless you devise the prompt for it, you won't be going down a flaming rabbit hole.