Hacker News new | ask | show | jobs
by needadvicebadly 939 days ago
I don't mean to be facetious in any way, but how does AGI/ASI protect itself from humans simply pulling the plug?
2 comments

Seems like it was pretty hard to pull the plug on sam? It will prevent you from pulling the plug by being smarter than you, spreading to more systems, making you dependent on it, persuasion,...
> It will prevent you

It will have no motivation to prevent you from pulling the plug. Human level intelligence is not the same as animal instinct.

If you had a system that was (a) oriented towards achieving some goal and (b) understood the world it was operating in, then allowing someone to "pull the plug" would interfere with it achieving the goal it was trying to optimize towards.

There has been some attempts at researching the question of how to design a intelligent system that is "corrigible" = willing to allow humans to change the goal it is set to optimize. This is unfortunately still an open question where no great solutions have been found that seem to be reliable when faced with a highly intelligent and capable AI system.

If you are interested in reading more, a few relevant search terms are "Off-switch game" and "corrigibility".

Neat idea, so it will conform like most humans in society and try to create more value than it destroys to avoid persecution and jail time? Sounds like our shared societal values will keep it in check then the same way they kept the openai board in check.
Why would it share our values? Human values are a result of our evolutionary history, which the AI will not share. We can't even formally describe them, so there's no hope of programming them into an AI. Of course, the AI will have to learn our values well enough to act in a convincingly friendly way while it's still weak, but knowing our values is not the same as sharing our values. Once it's gained enough power it can just kill us all. That's the most reliable way of ensuring we don't interfere with its goals.
GP was describing acquiring power, which is completely orthogonal to being aligned with our values. (Indeed, power-seeking is usually deemed to be a bad thing.)

There are certainly ecological and mimetic niches where pro-social behaviors will improve fitness. But it’s also certain that anti-social (defect/dominate/parasite) behaviors will improve fitness in many niches.

How do you “pull the plug” on a datacenter, or on all of the cloud providers if the ASI has processes running everywhere? Given that anyone with a credit card can already achieve robust multi-region deployments, it doesn’t seem hard for an ASI to make itself very hard to “turn off”.

Alternatively an ASI can ally with a group of humans to whom it can credibly promise wealth and power. If you think there is a baby machine god in your basement that is on your side, you’ll fight to protect it.

Airgap it. Give it no connections to the outside world, just a single controlled interface with a human operator. It's reduced to an advisor at this point, not an agent, but it removes most potential harm short of tricking its operators to plug it into the Internet.

In this scenario, pulling the plug is a matter of turning off power to the data centers it runs in - or simply disabling the one mode of external communication it has.

I keep hearing this argument, and it is the worst one of all because it neglects human greed.

AI feeds on data. If it can make you a million dollars air gapped, you'll be able to make a billion with it plugged in the net with it manipulating data.

That may be, but as far as feasibility goes: I'd bet on solving a social problem over an amorphous technical problem (alignment).
Historically, the world has not been great at universally coordinating responses to social problems - especially when it only takes one actor to break the “truce.”
Isn't agreeing to only run an AGI with whatever theoretical alignment controls we come up with, also a social agreement? Seems we will have to figure that out one way or another.
A sufficiently intelligent system that is un-aligned can likely subvert a human operator if it wants to.

But even if we ignore that, note that nobody is building their systems this way, and nor will they without extremely draconian laws requiring it. An airgapped system is substantially less valuable than one that is connected to the outside world.

I take the question “why can’t we turn it off” to refer to the actual real systems that we have built and will continue to build, not hypothetical systems we might build if we took safety risks very seriously.