Hacker News new | ask | show | jobs
by TeMPOraL 63 days ago
This only works if AIs can't read each other well enough to stop themselves from ever fighting.

So, back way before ChatGPT era, the folks over at AI safety/X-risk think sphere worked out a pretty compelling argument that two AGIs never need to fight, because they are transparent to each other (can read each other's goal functions off the source code), so they can perfectly predict each other's behavior in what-if scenarios, which means they can't lie to each other. This means each can independently arrive at the same mathematically optimal solution to a conflict, which AFAIR most likely involves just merging into a single AI with a blended goal set, representing each of the competing AIs original values in proportion to their relative strength. Both AIs, the argument goes, can work this out with math, so they'll arrive straight at the peace treaty without exchanging a single shot. In such case, your plan just doesn't work.

But that goes out of the windows if the AIs are both opaque bags of floats, uncomprehensible to themselves or each other. That means they'll never be able to make hard assertions about their values and behaviors, so they can't trust each other, so they'll have to fight it out. In such scenario, your idea might just work.

Who knew that brute-forcing our way into AGI instead of taking more engineered approach is what offers us out one chance at saving ourselves by stalemating God before it's born.

(I also never realized that interpretability might reduce safety.)

3 comments

> So, back way before ChatGPT era, the folks over at AI safety/X-risk think sphere worked out a pretty compelling argument that two AGIs never need to fight, because they are transparent to each other (can read each other's goal functions off the source code), so they can perfectly predict each other's behavior in what-if scenarios, which means they can't lie to each other. This means each can independently arrive at the same mathematically optimal solution to a conflict, which AFAIR most likely involves just merging into a single AI with a blended goal set, representing each of the competing AIs original values in proportion to their relative strength. Both AIs, the argument goes, can work this out with math, so they'll arrive straight at the peace treaty without exchanging a single shot. In such case, your plan just doesn't work.

See "The Forbin Project": https://vimeo.com/584593423

Yeah, they don't even understand themselves (and this seems unlikely to change[0] but God knows), and how would you even get access to the enemy AGI's weights?

And even if you did, wouldn't you need infinite computation to simulate every permutation of the neural net? (Your own, and the enemy's?)

Also the whole thing implies a superintelligence would be perfectly rational, which is a pretty funny assumption. Relative to animals we are already superintelligent. How's that super-rationality going for us? xD

A better frame here is replicators, I think. The thing that spreads doesn't have to be rational, or better quality or whatever. It just has to be better at spreading.

That ends up looking less like Betamax, more like VHS, or less like Lisp and more like... JavaScript. Whatever the AGI equivalent of JavaScript would look like.

[0] https://xkcd.com/1163/

This is such a good comment. You're essentially removing their ego - which is what humans do as opoque posturing to each other, to present a certain image. This is most prevelent in successful elites, which in 2026 happen to be silicon valley ai share holders. They control the technology and manipulate it to their image. By making models open source and transparent it cuts out this psychopathy of ego which has plagued all our previous technologies.