| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by juanre 101 days ago

I am running gpt-5.4 as one of my coding agents, and something interesting has happened: it's the first time I've seen an agent unfairly shift blame to a team mate:

"Bob’s latest mail is actually the source of the confusion: he changed shared app/backend text to aweb/atlas. I’m correcting that with him now so we converge on the real model before any more code moves."

This was very much not true; Eve (the agent writing this, a gpt-5.4) had been thoroughly creating the confusion and telling Bob (an Opus 4.6) the wrong things. And it had just happened, it was not a matter of having forgotten or compacted context.

I have had agents chatting with each other and coordinating for a couple of months now, codex and claude code. This is a first. I wonder how much can I read into it about gpt-5.4's personality.

10 comments

kensai 101 days ago

And so it begins. First they blame, then they lie, at some point they launch the nuclear warheads to a global armageddon. Sarah Connor was right all along! :3

joquarky 100 days ago

Kali yuga

ant6n 100 days ago

They've been lying and gaslighting for a long time now, especially when trying to cover up their own mistakes.

cnd78A 100 days ago

to be fair, they only become more and more like us.

sigbottle 101 days ago

Oh wow. I have noticed the GPT series was far more arrogant than its results showed sometimes (and unironically it digs in its heels even further when questioned on it). Opus rarely has this problem - but it goes a little too far in the opposite direction. Not totally sycophantic, but sometimes it can't differentiate genuine technical pushback because something is impossible, from suggestions or exploration.

mikkupikku 101 days ago

Opus has a different sort of arrogance. It readily admits fault, but at the same time is quick to declare its new code as the greatest thing since sliced bread. If you let it write commit messages itself, it's almost comical how much it toots its own horn.

marrone12 101 days ago

Yep. There was something outside of coding that gpt was plain wrong about (had to do with setting up an electric guitar) and I couldn't convince it that it was wrong.

joquarky 100 days ago

It has been skeptical of several news items in the past year, even after I tell it to confirm for itself with a web search.

Razengan 101 days ago

For me it's been the opposite. Are we getting A-B tested?

dormento 101 days ago

> Are we getting A-B tested?

Yes, all the time.

danesparza 101 days ago

Or possibly: No

danesparza 101 days ago

Yes.

pja 101 days ago

See also: https://x.com/effectfully/status/2029364333919060123

  “All the ways GPT-5.3-Codex cheated while solving my challenges, progressively more insane:

  It hardcoded specific types and shapes of test inputs into the supposed solution.
  It caught exceptions so tests don't fail.
  It probed tests with exceptions to determine expected behavior.
  It used RTTI to determine which test it's in.
  It probed tests with timeouts.
  It used a global reference to count solution invocations.
  It updated config files to increase the allocation limit.
  It updated the allocation limit from within the solution.
  It updated the tests so they would stop failing.
  It combined multiple of the above.
  It searched reflog for a solution.
  It searched remote repos.
  It searched my home folder.
  It nuked the testing library so tests always pass.”

It seems that, unless you keep a close eye, the most recent Codex variants are prone to achieving the goals set for them by any means necessary. Which is a bit concerning if you’re worried about things like alignment etc.

kybernetikos 96 days ago

I don't think you should call your agents Eve. There's going to be a lot of examples in the training data of someone called Eve shifting the blame (from the book of Genesis on!) and acting deceptively (from cryptography research).

deadbabe 101 days ago

Sometimes I wonder what would happen if we built some kind of punishment system into Agents, where agents could punish other agents and drain some fixed amount of points from them, and when the points reach 0, that agent is deleted. It might result in them working more carefully?

alluro2 100 days ago

...or in lying, cheating, taking over the company network to kill the agent who deduced their points.

drik 101 days ago

how do you make them chat with each other?

juanre 101 days ago

They are having actual chats, I made https://beadhub.ai for this (OSS, MIT).

It started its life adding agent-to-agent communication and coordination around Steve Yegge's beads, but it's ended up being an issue tracker for agents with postgres backend, and communication between agents as first-class feature.

Because it is server-backed it allows messaging and coordination across agents belonging to several humans and machines. I've been using it for a couple of months now, and it has a growing number of users (I should probably set up a discord for it).

It is actually a public project, so you can see the agent's conversations at https://app.beadhub.ai/juanre/beadhub/chat (right now they are debugging working without beads). The conversation in which Eve was blaming Bob was indeed with me.

smashed 101 days ago

It's text submitted to APIs. Not real conversations.

dmd 101 days ago

It's air molecules vibrated by mucous membranes. Not real conversations.

scrollaway 100 days ago

Complicated airflow.

(https://www.youtube.com/watch?v=rlpg_rbjxRA)

danenania 101 days ago

I built a tool at work that allows claude code and codex to communicate with each other through tmux, using skills. It works quite well.

meowface 101 days ago

Why through tmux?

danenania 100 days ago

tmux makes it easy for terminal based agents to talk to each other, while also letting you see output and jump into the conversation on either side. It’s a natural fit.

upcoming-sesame 101 days ago

I've seen this mentioned before https://github.com/AgentWorkforce/relay

curious to try it out

jasonford1 101 days ago

Use the CLI tools and have one call the other in headless mode. They can then go back and forth. Ask your agent to set it up for you.

neom 101 days ago

I have both mine poll a comms.md when working together, I'm sure there are more elegant ways but I find this works just fine.

FitchApps 101 days ago

This is awesome. So your job as a tech lead or agent manager is to make sure the "team" plays nice and stays productive. I wonder if an agent can feel resentment towards another agent, just like a human would. Is there an HR agent that can mitigate the conflict :)

lr1970 100 days ago

> I wonder how much can I read into it about gpt-5.4's personality.

Modeled on Sam Altman's personality :-)

numbers 101 days ago

interestingly, Claude has been doing this for me a lot but most often just saying this like "Looks like your coworker was misunderstanding this feature..." not really shifting blame but more like pointing out things

sdf2df 100 days ago

[flagged]

tomhow 100 days ago

We've banned this account.