Hacker News new | ask | show | jobs
by trevin 270 days ago
I’m always fascinated by the fine-tuning of LLM personalities. Might we finally get less of the reflexive “You’re absolutely right” with this one?

Maybe we’re entering the Emo Claude era.

Per the system card: In 250k real conversations, Claude Sonnet 4.5 expressed happiness about half as often as Claude 4, though distress remained steady.

4 comments

I like being lightly flattered.
I don't, I need someone telling me the flaws of my ideas, not to confirm them for the sake of it.
You raise an excellent point but affirming bad ideas is probably not anyone's idea of "light flattery".
You raise a not so excellent point.

It might not anyone's idea of "light flattery", but it's certainly is what most LLMs do, which is the main point of the conversation and your comment seems to be derailing it.

No change to “absolutely right”. I did get “You’re right” once though.
Here I am, brain the size of a planet...
I personally enjoy the “You’re absolutely right!” exclamation. It signals alignment with my feedback in a consistent manner.
You’re overlooking the fact that it still says that when you are, in reality, absolutely wrong.
That’s not the purpose of it, as I understand it; it’s a token phrase generated to cajole it down a particular path.[1] An alignment mechanism.

The complement appears to be, “actually, that’s not right.”, a correction mechanism.

1: https://news.ycombinator.com/item?id=45137802

It gets annoying because A) it so quickly dismisses its own logic and conclusion from less than two minutes ago (extreme confidence with minimal conviction), and B) it fucks up the second time too (sometimes in the same way!) about 33% of the time.
Gemini 2.5 Pro seems to have a tic where after an initial failed task, it then starts asserting escalating levels of confidence for each subsequent attempt. Like it's ever conscious of its failure lingering in its context and feels the need to over compensate as a form of reassuring both the user and itself that it's not going to immediately faceplant again.
ChatGPT does the same thing, to the point that after several rounds of pointing out errors or hallucinations it will say things like “Ok, you’re right. No more foolish mistakes. This is it, for all the marbles. Here is an assured, triple-checked, 100% error-free, working script, with no chance of failure.”

Which fails in pretty much the exact same way it did before.

Once ChatGPT hits that supremely confident “Ok nothing was working because I was being an idiot but now I’m not” type of dialogue, I know it’s time to just start a new chat. There’s no pulling it out of “spinning the tires while gaslighting” mode.

I’ve even had it go as far as outputting a zip file with an empty .txt that supposedly contained the solution to a certain problem it was having issues with.

I’ve had the opposite experience with GPT-5, where it’s utterly convinced that its own (incorrect) solution is the way to go that it turns me down and preemptively launches tools to implement what it has in mind.

I get that it’s tradeoffs, but erring on the side of the human being correct is probably going to be a safer bet for another generation or two.

Hmmh. I believe your explanation, but I don't think that's the full story. It's also a sycophancy mechanism to maximize engagement from real users and reward hack AI labelers.
That doesn’t seem plausible to me. Not that LLMs can’t be sycophantic, but I don’t think this phrase in particular is part of it.

It’s a canned phrase in a place where an LLM could be much more creative to much greater efficacy.

I think there’s something to it.

Part of me thinks that when they do their “which of these responses do you prefer” A/B test on users… whereas perhaps many on HN would try to judge the level of technical detail, complexity, usefulness… I’m inclined to believe the midwit population at large would be inclined to choose the option where the magic AI supercomputer reaffirms and praises the wisdom of whatever they say, no matter how stupid or wrong it is.

But the there’s also the negative psychological impact on the user having the model so strongly agree with them all the time. —— I cannot be the only one who half expects humans to say this to me all the time now?
And that it often spits out the exact same wrong answer in response.