Hacker News new | ask | show | jobs
by mitthrowaway2 941 days ago
According to Paul Christiano, AIs would likely find it easier to establish mutual trust and binding agreements. This means they are more likely to cooperate with other AIs.
4 comments

AIs are a greater threat to each other than humans ever could be to AI.

We don't even compete for the same resources! (except energy which is abundant)

AI and humans have a naturally cooperative relationship (AI helps humans with boring tasks & scientific discovery to make life better, humans created AI and will debug it & turn it back on if anything bad happens to it).

Whereas multiple (superintelligent, aware) AIs have a naturally antagonistic relationship ("you using GPU cycles means that I'm not using those GPU cycles").

Possibly the biggest fear of an AI would be a "split brain" situation.

> humans created AI and will debug it & turn it back on if anything bad happens to it

I think this is a little naive honestly. One because you're assuming AI will care about it's creators like humans care about their parents, and two you're assuming AI cares about being "turned back on" like humans have a desire to live.

There's absolutely no reason to believe an AI will give a damn about its creator beyond its ability to use that creators affection for it for its own gain.

> you're assuming AI cares about being "turned back on" like humans have a desire to live.

> for it for its own gain

you seem confused

almost any kind of "its own gain" requires "long-term planning" which pretty much requires the agent to prioritise staying "alive" (i.e. being able to keep playing)

Energy may be abundant in the universe, but the energy we produce is limited. And for example, solar energy requires extensive land use.

Humans have the option to shut down AI, and this alone can create an antagonistic relationship if the AI's goals differ from ours. There are countless ways in which our best interests may not align with those of AI. It's more challenging to find areas of alignment.

As soon as AI reaches above-human capabilities, it will be able to expand into space (1) where it will be beyond human reach and (2) where energy (in particular, solar) is much more plentiful than on Earth.
If the AI happened to originate in space, wouldn't its first high-resource targets of interest be the planets? If it originated on Earth, I don't see why it would leave this place intact when it contains so much that can be put to use.
If an AI is on a computing machine, how will it get to space? Are all processes to make, move, and launch a ship automated? I'm kind of confused on that jump in logic.
I'm imagining the AI hacking a Voyager probe and being sorely disappointed at its capabilities.
“As soon” is a big jump. There’s no proof or even logical arguments as to how this can ever happen
>We don't even compete for the same resources!

That may also mean we have fewer shared interests. For example, new semiconductor fabs might benefit all AIs by making compute more abundant, but occupy prime farming land and water resources that humans want to use for growing crops.

According to Vladimir Lenin [1], the problem with quotes on the Internet is that people immediately believe in their authenticity.

[1] https://www.quotes.net/quote/77867

Can you elaborate on why AI will find it easier to establish mutual trust and binding agreements?

I think he explains it in this interview. But I'm out right now, so I can't verify.

https://youtu.be/GyFkWb903aU

It is two hours long video and you cannot verify presence of the pertinent information.

AI are functions, it is very easy to make an exact simulation of their collective behavior. Did that person do that?

I somewhat misremembered; it looks like his point is less about mutual trust and more about supporting whoever has control of reward channels.

It starts here where Christiano says that an AI takeover might follow the dynamics of a coup: https://youtu.be/GyFkWb903aU?si=78_U-du3kLjmwNcl&t=2206

And goes into more detail here: https://youtu.be/GyFkWb903aU?si=78_U-du3kLjmwNcl&t=2830

"Suppose that I've been tasked with helping defend you from some other AIs.... My job is, someone is coming to hack your computer and I'm supposed to help defend you. Supposed to help improve your security situation, whatever. And I'm wondering, what is it I could do that will get me a high reward. And one thing I could do that will get me a high reward is actually helping defend your computer, doing the task you actually asked me to do. But another way I can get a high reward is by saying at the end of the day what actually matters is just how you measure my performance. And your measurements of my performance ultimately are just entering some numbers into a dataset somewhere, something a computer says about how well I did. And it would really be much better if I were to just work with this AI who is attempting to attack you and say hey, AI who is invading, you know what, if you just help me, and we both make it look like I did a really good job, like I win, you win because you got the person's stuff; I'm going to get a really high rating because all the numbers that are going to be entered in the dataset are going to be really high, this is a win-win, everyone is happy."

"In some sense what all the AIs want, what every AI in the world in this scenario wants is just to be rated really highly. And while humans are in control, the way to get your behavior to be rated really highly is to do things humans like, and then they'll rate it really highly. But if you can see this prospect, of humans losing control of the situation and instead AIs controlling the situation, you'd be like 'I would go for that.'"

Why?

If you assume competition for resources, AIs would be more in competition with each other than with carbon based humans.

Why would they do that? I don’t get it. I’m not being sarcastic I just don’t understand why it’s easier for them to cooperate with other A.Is. Based on what?
If I understand correctly, one reason would be if they have the ability to inspect each others source code (or if they share the same source code), run unit tests, and so on. Basically the same things that humans would do to figure out whether an AI is trustworthy, and which you can't very easily do to a human.
Correction: The above is Eliezer Yudkowsky's reasoning. Paul Christiano's is that AIs would cooperate with anyone who would likely be able to gain authority over their reward channel, including other AIs attempting to seize power from humans.