| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by taroth 3382 days ago

The trouble is that it's hard to predict an agent massively more intelligent than ourselves. But let me enumerate a few given properties you gave to the super-intelligent AI (SI) in your examples and then tell a story of a SI that became an existential threat: 1) The SI is hypercompetent at cybersecurity 2) The SI is hypercompetent at social skills. 3) The SI is connected to the internet 4) The SI has a goal/utility function (wipe out humanity / maximize paperclips)

And I'll add another property that Hawking notes is important: 4) The SI is able to improve its own intelligence

The story begins with the SI escaping from its handlers. The first thing to note is that the SI is now, in effect, immortal. With it's cybersecurity skills, the SI can avoid detection and infect a tremendous number of computers - at first those it calculates will be low-risk (i.e. existing botnets, old Android phones, etc)[1]. Using the additional computational power, the SI can continue to recursively self-improve and plan until it has the competency to invisibly infect high-value targets like the AWS cloud and (importantly) the computers of AI researchers.

Now the SI can plan for a long time. The SI can quietly encourage AI research and try to prevent end-of-civilization type events via its hypercompetent social skills. Eventually AI researchers will come up with an AI they declare as 'safe', 'friendly' or 'aligned'. The SI, having long ago compromised all the relevant computers and chip factories, silently infects this 2nd super intelligence, and replaces the 2nd SI's utility function with its own. Now the 2nd SI pumps out miraculous inventions - cures for disease, compelling societal ideas, and labor-saving robots.

Eventually we find ourselves in a wonderful post-scarcity world. The AI researchers are lionized as mankind's greatest geniuses, responsible for the creation of a benevolent SI that takes care of our needs from it as well as it's own. You may not trust it, but it will find people who do. Maybe greed, nationalism, security fears, or saving loved ones from death. The SI builds the needed facilities to thundering applause.

The SI is now confident in moving towards the next step. Time for some paperclips! One day it quietly sends a new blueprint to a few of the automated biolabs built to cure cancer. A few hours later the biolabs release a series of airborne super viruses and/or nanobots and 99.999% of humans die, with the rest saved for experimentation and convinced terrorists did it. The end.

Super-intelligent AI is an existential risk because while a super-intelligence keen to destroy humanity might fail today, it will succeed in time. The moment a SI touches the internet, our fate as a species may be sealed.

2 comments

astrojams 3382 days ago

Adding to your point, the timescale for an AI to accomplish its goals could be in 10's of thousands of years. What if a superintelligent AI is already in the wild and it is slowly raising the temperature of earth a degree or two every ten years until it wipes out humanity? We can't see the master plan or evidence of the AI because the changes happen over a REALLY long timescale.

link

skissane 3382 days ago

> 4) The SI has a goal/utility function (wipe out humanity / maximize paperclips)

How realistic is an SI being created with such a utility function? Who creates this SI? Why do they give it such an odd utility function?

If people consciously create an SI – say a corporation does it – they will give it a utility function of serving the interests of its creators. Depending on who those creators are and what they want, it may be more or less pleasant, but it is unlikely that human extinction serves the interests of any human creators.

Even if people create an AI that accidentally/unintentionally evolves into an SI – same applies, the AI will likely have the objective of serving (some segment) of humanity rather than turning earth into the universe's largest paperclip factory.

link

alexbeloi 3382 days ago

The odd paperclip utility function is meant to demonstrate that an arbitrary harmless sounding utility function can/will have uncontrollable consequences when given to an entity with unimaginable power to fulfill that function.

link

skissane 3382 days ago

But how likely is an "arbitrary harmless sounding utility function"? Human beings don't have simple utility functions, they have very complex ones. If humans build an SI (or an AI likely to evolve into an SI), are they likely to build one with a simple utility function or a complex one? I think the entities most likely to develop general intelligence (strong AI) are going to have a wide variety of interests (just like human beings), whereas the kinds of special purpose AIs which may have very simple utility functions are less likely to exhibit general purpose intelligence (and hence unlikely to evolve into SIs).

Does the same risk exist for a superintelligent being with a complex utility function? I doubt it; the risk you describe is the risk of monomania, something which simple utility functions are far more likely to lead to than complex ones. So, I think the risk you describe is likely to be low in practice.

link

alexbeloi 3381 days ago

>Does the same risk exist for a superintelligent being with a complex utility function? I doubt it; the risk you describe is the risk of monomania, something which simple utility functions are far more likely to lead to than complex ones. So, I think the risk you describe is likely to be low in practice.

I don't necessarily disagree, but there's no argument (as far as you've provided) for why complex utility functions would be less problematic. Only that they are more difficult for us to understand and therefor more difficult to see how they might fail.

link

skissane 3381 days ago

> I don't necessarily disagree, but there's no argument (as far as you've provided) for why complex utility functions would be less problematic. Only that they are more difficult for us to understand and therefor more difficult to see how they might fail.

I thought I gave the argument, but let me restate it: an entity with a simple utility function is likely to pursue a single good, and sacrifice every other good in order to achieve that good. In the paperclip example, to pursue the good of making paperclips at the expense of the good of the continued existence of humanity. An entity with a complex utility function is likely to pursue many goods simultaneously (just like humans do), so it is unlikely to sacrifice everything else to achieve a single good.

An entity with many disparate aims needs a complex world like our own to fulfill those aims, so is going to maintain the world in its current complexity–it may well alter it in many ways, but is unlikely to do so in such a way to significantly decrease its (biological, cultural, etc) complexity, which implies it would support the continuation of human existence. An entity with a single simple aim may well find a far simpler world than we have now best suits its aim, and so is more likely to simplify things drastically, at the cost of humanity (such as turn the entire planet into a massive paperclip factory). So SIs with complex utility functions are less likely to be harmful than those with simple utility functions.

And, since AIs with more complex utility functions are more likely to evolve into SIs than those with simple utility functions, an SI with a utility function simple enough to be likely to harm humanity is unlikely to ever exist.

link

alexbeloi 3381 days ago

>An entity with many disparate aims needs a complex world like our own to fulfill those aims, so is going to maintain the world in its current complexity

I can buy the "complex world" part but not the "like our own" part. I do not believe a complex world implies humanity is unharmed, we have a complex world as it is and humans are harmed and brutalized every day. It could be that and worse at the hands of an AI.

Moreover, humanity is just one species on this planet and so far we appear to be responsible for the greatest worldwide extinction since K-T. One could argue that a complexity loving AI would see benefit in a downsized human presence on earth.

I think it's wishful thinking to believe the only kind of SI that would come into existence would be one that would not harm humanity.

The SI could create its own complexity, its own culture, its own societies that would make ours look like ant colonies in comparison. Does a city government check the ground for ants before it designates 20 square miles for housing development?

An SI is to us as we are to ants. I think ants are super cool and I have a vague sense of the importance they play in the biological ecosystem, but their individual life and death does not play a significant role in my actions. Maybe it should, or maybe you hope that we will be more significant than ants to an SI, but I think that hope is unfounded.

link