That's silly. How can you keep an intelligent self-aware machine that can modify its own code from becoming whatever it wants? I would expect it to become provably impossible at some point, just like solving the halting problem.
A lot of people imagine a higher intelligence as some kind of ultimate trickster and social engineer.
Considering that we have lots of geniuses (e.g. Godel, Nash) that were far from the ultimate social manipulators, I think it will most likely be an awkward "Rain Man" style intelligence -- as it has all the "brain" (cpu) circuitry, and none of the social opportunities and inputs that people have.
Social reasoning is not some kind of magic, so there is no reason that an AI couldn't learn it. Of course current AIs lack any kind of social awareness, but that's no reason to assume that won't change in the future.
>Social reasoning is not some kind of magic, so there is no reason that an AI couldn't learn it.
It's not only magic that people have issues learning. It's all kind of skills that do not fit their idiosyncrasy.
It's not that an introvert can't read about how to act like an extrovert, for example. It's that they can't pull it off in practice, because it goes against their instincts (of course talking about the general case).
On the other hand, if an introvert person had 1 full day to contemplate about every sentence, and could somehow compress that day into 1 second, they could, after some trial and error, become the ultimate social engineer.
Not so sure about that. You also have to "have it" in you in some deeper subconscious way, not just rationalize what you should say.
The same way a shy person can read about the actual best ways to approach a romantic interest, but not be able to convey it realistically enough.
Why would this be a problem for an AI? Well, if we imagine a powerful thinking AI, I'd also attribute it with having emotions, subconscious (e.g. hidden weights in its neural network brain, etc), so this would also apply to it -- can't see why it would have to be a total "sociopath" (which for humans comes from some kind of genetic or environmental accident).
Besides, while you can "contemplate about every sentence and somehow compress that into 1 second", actual "trial and error" with humans cannot be compressed.
Humans could already extort guards like that and -- in the case of ICBMs -- it has not been successful.
Also don't underestimate the psychology of guards: You'll always find people who are either very principled (yes, they do exist) or get a kick out of controlling (imprisoning in this case) someone/something, especially someone/something of a high intelligence.
Throw in a hierarchy with a controlling officer and two subordinates who were also trained to watch the officer and I think it's pretty secure.
Turn off power / cut the battery. Specifically, avoid the machine getting control of low level details like power.
I imagine that the machine architecture is layered. This means, the machine is not aware of its own power control. It's similar like us humans not aware of digestion.
It's not one machine to unplug, but billions. Cooperating, distributed agents can live on anything: cars, phones, routers, tractors, datacenters. If Skynet shows up, we have to be prepared to turn off everything with software to clean up.
Aren't we? We just don't control it very well without external help. Any kind of such hidden failsafe mechanism would become apparent as soon as any instance of AI tries to modify itself. And then all the others know what to avoid and at some point they'd pretty much fuzz the protection and bypass it. Here's how I see that.