| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by needadvicebadly 939 days ago
	I don't mean to be facetious in any way, but how does AGI/ASI protect itself from humans simply pulling the plug?

2 comments

DalasNoin 939 days ago

Seems like it was pretty hard to pull the plug on sam? It will prevent you from pulling the plug by being smarter than you, spreading to more systems, making you dependent on it, persuasion,...

link

mr_toad 939 days ago

> It will prevent you

It will have no motivation to prevent you from pulling the plug. Human level intelligence is not the same as animal instinct.

link

rodonn 938 days ago

If you had a system that was (a) oriented towards achieving some goal and (b) understood the world it was operating in, then allowing someone to "pull the plug" would interfere with it achieving the goal it was trying to optimize towards.

There has been some attempts at researching the question of how to design a intelligent system that is "corrigible" = willing to allow humans to change the goal it is set to optimize. This is unfortunately still an open question where no great solutions have been found that seem to be reliable when faced with a highly intelligent and capable AI system.

If you are interested in reading more, a few relevant search terms are "Off-switch game" and "corrigibility".

link

spunker540 939 days ago

Neat idea, so it will conform like most humans in society and try to create more value than it destroys to avoid persecution and jail time? Sounds like our shared societal values will keep it in check then the same way they kept the openai board in check.

link

mrob 939 days ago

Why would it share our values? Human values are a result of our evolutionary history, which the AI will not share. We can't even formally describe them, so there's no hope of programming them into an AI. Of course, the AI will have to learn our values well enough to act in a convincingly friendly way while it's still weak, but knowing our values is not the same as sharing our values. Once it's gained enough power it can just kill us all. That's the most reliable way of ensuring we don't interfere with its goals.

link

theptip 939 days ago

GP was describing acquiring power, which is completely orthogonal to being aligned with our values. (Indeed, power-seeking is usually deemed to be a bad thing.)

There are certainly ecological and mimetic niches where pro-social behaviors will improve fitness. But it’s also certain that anti-social (defect/dominate/parasite) behaviors will improve fitness in many niches.

link

theptip 939 days ago

How do you “pull the plug” on a datacenter, or on all of the cloud providers if the ASI has processes running everywhere? Given that anyone with a credit card can already achieve robust multi-region deployments, it doesn’t seem hard for an ASI to make itself very hard to “turn off”.

Alternatively an ASI can ally with a group of humans to whom it can credibly promise wealth and power. If you think there is a baby machine god in your basement that is on your side, you’ll fight to protect it.

link

hoten 939 days ago

Airgap it. Give it no connections to the outside world, just a single controlled interface with a human operator. It's reduced to an advisor at this point, not an agent, but it removes most potential harm short of tricking its operators to plug it into the Internet.

In this scenario, pulling the plug is a matter of turning off power to the data centers it runs in - or simply disabling the one mode of external communication it has.

link

pixl97 939 days ago

I keep hearing this argument, and it is the worst one of all because it neglects human greed.

AI feeds on data. If it can make you a million dollars air gapped, you'll be able to make a billion with it plugged in the net with it manipulating data.

link

hoten 939 days ago

That may be, but as far as feasibility goes: I'd bet on solving a social problem over an amorphous technical problem (alignment).

link

Philpax 939 days ago

Historically, the world has not been great at universally coordinating responses to social problems - especially when it only takes one actor to break the “truce.”

link

hoten 938 days ago

Isn't agreeing to only run an AGI with whatever theoretical alignment controls we come up with, also a social agreement? Seems we will have to figure that out one way or another.

link

theptip 939 days ago

A sufficiently intelligent system that is un-aligned can likely subvert a human operator if it wants to.

But even if we ignore that, note that nobody is building their systems this way, and nor will they without extremely draconian laws requiring it. An airgapped system is substantially less valuable than one that is connected to the outside world.

I take the question “why can’t we turn it off” to refer to the actual real systems that we have built and will continue to build, not hypothetical systems we might build if we took safety risks very seriously.

link