Hacker News new | ask | show | jobs
by throwerofstone 714 days ago
The author states that AI safety is very important, that many experts think it is very important and that even governments consider it to be very important, but there is no mention of why it is important or what "safe" AI even looks like. Am I that out of the loop that what this concept entails is so obvious that it doesn't require an explanation, or am I overlooking something here?
5 comments

The idea that most AIs are unsafe to non-AI interests is foundational to the field and typically called instrumental convergence [1]. You can also look up the term "paperclip maximizer" to find some concrete examples of what people fear.

[1]: https://en.m.wikipedia.org/wiki/Instrumental_convergence

It's unfortunately hard to describe what a safe AI would look like, although many have tried. Similar to mathematics, knowing what the correct equation looks like is a huge advantage in building the proof needed to arrive at it, so this has never bothered me much.

You can see echoes of instrumental convergence in your everyday life if you look hard enough. Most of us have wildly varying goals, but for most of those goals, money is a useful way to achieve them -- at least up to a point. That's convergence. An AI would probably get a lot farther by making a lot of money too, no matter what the goal is.

Where this metaphor breaks down is we human beings often arrive at a natural satiety point with chasing our goals: We can't just surf all day, we eventually want to sleep or eat or go paddle boarding instead. A surfing AI would have no such limiters, and might do such catastrophic things as use its vast wealth to redirect the world's energy supplies to create the biggest Kahuna waves possible to max out its arbitrarily assigned SurfScore.

I couldn't find concrete examples that weren't actually of AI with godlike powers.
What do you mean by "godlike powers"?

We flatten mountains to get at the rocks under them. We fly far above the clouds to reach our holiday destinations.

We have in our pockets devices made from metal purified out of sand, lightly poisoned, covered in arcane glyphs that so small they can never be seen by our eyes and so numerous that you would die of old age before being able to count them all, which are used to signal across the world in the blink of an eye (never mind (Shakespeare's) Puck's boast of putting a girdle around the earth in 40 minutes, the one we actually build and placed across the oceans sends information around it in 400 milliseconds), used to search through libraries grander than any from the time when Zeus was worshiped, and used to invent new images and words from prompts alone.

We power our sufficiently advanced technology with condensed sunlight and wind, and with the primordial energies bound into rocks and tides; and we have put new πλανῆται (planētai, "wandering" star) in the heavens to do the job of the god Mercurius better than he ever could in any myth or legend. And those homes themselves are made from νέος λίθος ("neolithic", new rock).

We've seen the moon from the far side, both in person and by גּוֹלֶם (golem, for what else are our mechanised servants?); and likewise to the bottom of the ocean, deep enough that スサノオ (Susanoo, god of sea and storms) could not cast harm our way; we have passed the need for prayer to Τηθύς (Tethys) for fresh water as we can purify the oceans; and Ἄρης (Ares) would tremble before us as we have made individual weapons powered by the same process that gives the sun its light and warmth that can devastate areas larger than some of the entire kingdoms of old.

By the same means do our homes, our pockets, have within them small works of artifice that act as húsvættir (house spirits) that bring us light and music whenever we simply ask for them, and stop when we ask them to stop.

We've cured (some forms of) blindness, deafness, lameness; we have cured leprosy and the plague; we have utterly eliminated smallpox, the disease for which शीतला (Seetla, Hindu goddess for curing various things) is most directly linked; we can take someone's heart out and put a new one in without them dying — if Sekhmet (Egyptian goddess of medicine) or Ninkarrak (Mesopotamian, ditto) could do that, I've not heard the tales; we have scanners which look inside the body without the need to cut, and some which can even give a rough idea of what images the subjects are imagining.

"We are close to gods, and on the far side", as Banks put it.

Wonderfully written, and though I've seen this kind of reshaping of perspective on our human achievements in the modern world before, you've done it exceptionally well here.
The article itself is talking about a specific book. "Superintelligence: Paths, Dangers, Strategies" by Nick Bostrom. That book is the seminal work on the subject of AI safety. If you honestly want answers to your questions I recommend reading it. It is written in a very accessible way.

If reading a whole book is out of question then I'm sure you can find many abridged versions of it. In fact the article itself provides some pointers at the very end of it.

> Am I that out of the loop

Maybe? Kinda? That's the point of the article. There has been 10 years since the publication of the book. During that time the topic went from the weird interest of some Oxford philosopher to a mainstream topic discussed widely. 10 years is both a long time and a blink of an eye. Depending on your frame of reference. But it is never too late to get in the loop if you want to.

At the same time I don't think it is fair to expect from every article ever to rehash the basic concepts of the field they are working on.

> It is written in a very accessible way

Many have expressed my sentiments far better than I can, but Superintelligence is quite frankly written in a very tedious way. He says in around 300 pages what should have been an essay.

I also found some of his arguments laughably bad. He mentions that AI might create a world of a handful of trillionaires, but doesn’t seem to see this extreme inequality as an issue or existential threat in and of itself.

He did write an essay [0]. Because it was very short and not deeply insightful due to such length, he wrote a longer book talking about the concepts.

[0] https://nickbostrom.com/views/superintelligence.pdf

> frankly written in a very tedious way.

Ok? I don't see the contradiction. When I say "It is written in a very accessible way" I mean to say "you will understand it". Even if you don't have years of philosophy education. Which is sadly not a given in this day and age. "frankly written in a very tedious way" seems to be talking about how much fun you will have while reading it. That is an orthogonal concern.

> He says in around 300 pages what should have been an essay.

Looking forward to your essay.

> I also found some of his arguments laughably bad.

Didn't say that I agree with everything written in it. But if you want to understand what the heck people mean by AI safety, and why they think it is important then it has the answers.

> He mentions that AI might create a world of a handful of trillionaires, but doesn’t seem to see this extreme inequality as an issue or existential threat in and of itself.

So wait. Is your problem that the argument is bad, or that it doesn't cover everything? I'm sure your essay will do a better job.

> He mentions that AI might create a world of a handful of trillionaires, but doesn’t seem to see this extreme inequality as an issue or existential threat in and of itself.

I've not read the book, so I don't know the full scope of that statement.

In isolation, that's not a big issue and not an existential threat, as it depends on the details.

For example, a handful of trillionaires where everyone else is "merely" as rich as Elon Musk isn't a major inequality, it's one where everyone's mid-life crisis looks e.g. like whichever sci-fi spaceship or fantasy castle they remember fondly from childhood.

Haven't read the book either, but a handful of trillionaires could be that the "upper 10 000" oligarchs of the USA get to be those trillionaires, and everyone else starves to death or simply can't afford to have children and a few decades later dies from old age.

Right now, in order to grow and thrive, economies need educated people to run it, and in order to get people educated you need to give them some level of wealth to have their lower level needs met.

It's a win-win situation. Poor/starving people go to arms more quickly and destabilize economies. Educated people are the engineers, doctors and nurses. But once human labour isn't needed any more, there is no need for those people any more either.

So AI allows you to deal with poor people much better now than in the past: an AI army helps to prevent revolutions and AI engineers, doctors, mechanics, etc, eliminate the need for educated people.

There is the economic effect that consumption drives economic growth, which is a real effect that has powered the industrial revolution and given wealth to some of today's rich people. Of course, a landlord has the incentive for people to live in his house, that's what gives it value. Same goes for a farmer, he wants people to eat his food.

But there is already a certain chunk of the economy which only caters to the super rich, say the yacht construction industry. If this chunk keeps on growing while the 99% get less and less purchasing power, and the rich eventually transition their assets into that industry, they get less and less incentives to keep the bottom 99% fed/around.

I'm not saying this is going to happen, but it's entirely possible to happen. It's also possible that every individual human will be incredibly wealthy compared to today (in many ways, the millions in the middle classes in the west today live better than kings a thousand years ago).

In the end, it will depend on human decisions which kinds of post-AI societies we will be building.

Indeed, I was only giving the "it can be fine" example to illustrate an alternative to "it must be bad".

As it happens, I am rather concerned about how we get from here to there, as in the middle there's likely a point where we have some AI that's human-level at ability, which needs 1 kW to do in 1 hour what a human would do in 1 hour, and at current electricity prices that's something humans have to go down to the UN abject poverty threshold to be cost-competitive with while simultaneously being four times the current global per-capita electricity supply which would drive up prices until some balance was reached.

But that balance point is in the form of electricity being much more expensive, and a lot of people no longer being able to afford to use it at all.

It's the traditional (not current) left vs. right split — rising tides lifting all boats vs. boats being the status symbol to prove you're an elite and letting the rest drown — we may get well-off people who task their robots and AI to make more so the poor can be well-off, or we may have exactly as you describe.

Or imagine if AI provides access to extending life and youth indefinitely, but that doing so costs about 1% of the GDP of the US to do.

Combine that with a small ruling class haveing captured all political power through a fully robotic police/military force capable of suppressing any human rebellion.

I don't find it difficult to imagine a clique of 50 people or so sacrificing the welfere of the rest of the population to personally be able to live a life in ultimate luxery and AI generated bliss that lasts "forever". They will probably even find a way to frame it as the noble and moral thing to do.

Once the police and military do not need a single human to operate, the basis for democracy may be completely gone.

Consider past periods of history where only a small number of soldiers could dominate much larger number of armed citizens, and you will notice that most of them were ruled by the soldier class. (knights, samurai, post Marian Reform Rome).

Democracy is really something that shows up in history whenever armed citizens form stronger armies than such elite militaries.

And a fully automated military, controlled by 0-1 humans at the top, is the ultimate concentration of power. Imagine the political leader you despise the most (current or historical) with such power.

AI is safe if it does not cause extinction of humanity. Then it is self-evident why it is important.

The article does link to "Statement on AI Risk", at https://www.safe.ai/work/statement-on-ai-risk

It is very short, so here is full quote.

> Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.

> AI is safe if it does not cause extinction of humanity.

I don't think that is true. "AI is not safe if it cause extinction of humanity." is more likely to be true. But that is a necessary requirement but not sufficient.

Just think of a counter example: An AI system which wages war on humanity, wins and then keeps a stable breeding population of humans in abject suffering in a zoo like exhibit. This hypothetical AI did not cause extinction of humanity. Would you consider it safe? I would not.

That's called "s-risk" (suffering risk). Some people in the space do indeed take it much more seriously than "x-risk" (extinction risk).

If you are deeply morally concerned about this, and consider it likely, then you might want to consider getting to work on building an AI which merely causes extinction, ASAP, before we reinvent that one sci-fi novel.

Personally, I see no particular reason to think this is a very likely outcome. The AI probably doesn't hate us - we're just made out of joules it can use better elsewhere. x-risk seems much more justified to me as a concern.

> The AI probably doesn't hate us

The AI doesn't have to hate us for this outcome. In fact it might be done to cocoon and "protect" us. It just has different idea from us what needs to be protected and how. Or alternatively it can serve (perfectly or in a faulty way) the aims of its masters. A few lords reigning over suffering masses.

> If you are deeply morally concerned about this, and consider it likely, then you might want to consider getting to work on building an AI which merely causes extinction, ASAP, before we reinvent that one sci-fi novel.

What a weird response. Like one can't be concerned about two ( (or more!) things simultaneously? Talk about "Cutting off one's nose to spite one's face"

The quote I've heard is: 'The AI does not hate you, nor does it love you, but you are made of atoms which it can use for something else': https://www.amazon.de/-/en/Tom-Chivers/dp/1474608787 (another book I've not read).

> Or alternatively it can serve (perfectly or in a faulty way) the aims of its masters.

Our state of knowledge is so bad that being able to do that would be an improvement.

The argument is that "humans live, but suffer" is a smaller outcome domain and thus less likely to be hit than an outcome incompatible with human life. Because at that point, getting something to care about humans at all, you've already succeeded with 99% of the alignment task and only failed at the last 1% of making it care in a way we'd prefer. If it were obvious that rough alignment is easy but the last few bits of precision or accuracy are hard that'd be different.

I fail to see a broad set of paths that end up with a totally unaligned AGIs and yet humans live but in a miserable state.

Of course we can always imagine some "movie plot" scenarios that happen to get some low-probability outcome by mere chance. But that's focusing one's worry on winning an anti-lottery rather than allocating resources to the more common failure modes.

> already succeeded with 99% of the alignment task and only failed at the last 1% of making it care in a way we'd prefer.

Who is we? Humanity does not think with one unified head. I'm talking about a scenario where someone makes the AI which serves their goals, but in doing so harms others.

AGI won't just happen on its own. Someone builds it. That someone has some goals in mind (they want to be rich, they want to protect themselves from their enemies, whatever). They will fiddle with it until they think the AGI shares those goals. If they think they didn't manage to do it they will strangle the AGI in its cradle and retry. This can go terribly wrong and kill us all (x-risk). Or it can succeed where the people making the AGI aligned it with their goals. The jump you are making is to assume that if the people making the AGI aligned it with their goals that AGI will also align with all of humanity's goals. I don't see why that would be the case.

You are saying that doing one is 99% of the work and the rest is 1%. Why do you think so?

> Of course we can always imagine some "movie plot" scenarios that happen to get some low-probability outcome by mere chance.

Definitions are not based on probabilities. sanxiyn wrote "AI is safe if it does not cause extinction of humanity." To show my disagreement I described a scenairo where the condition is true (that is the AI does not cause extinction of humanity), but I would not describe as "safe AI". I do not have to show that this scenario is likely to show the issue with the statement. Merely that it is possible.

> focusing one's worry on winning an anti-lottery rather than allocating resources to the more common failure modes.

You state that one is more common without arguing why. Stuff which "plainly doesn't work and harmful for everybody" is discontinued. Stuff which "kinda works and makes the owners/creators happy but has side effects on others" is the norm, not the exception.

Just think of the currently existing superinteligences: corporations. They make their owners fabulously rich and well protected, while they corrupt and endanger the society around them in various ways. Just look at all the wealth oil companies accumulated for a few while unintentionally geo-engineering the planet and systematically suppressing knowledge about climate change. That's not a movie plot. That's the reality you live in. Why do you think AGI will be different?

or it could be a elaborate ruse to keep power very concentrated.
It’s not a technical term. The dictionary definition of safety is what they mean. They don’t want to create an AI that causes dangerous outcomes.

Whether this concept is actionable is another matter.

AI is unsafe if it doesn't answer to the board of directors or parliament. Also paperclip maximizers, as opposed to optimizing for gdp.
Yeah, the constant dissonance with AI safety is that every single AI safety problem is already a problem with large corporations not having incentives aligned with the good of people in general. Profit is just another paperclip.
Not only but also; they're also every problem with buggy software.

Corporations don't like to kill their own stakeholders; a misplaced minus sign, which has happened at least once*, and your AI is trying as hard as possible to do the exact opposite of one of the things you want.

* https://forum.effectivealtruism.org/posts/5mADSy8tNwtsmT3KG/...

Is that dissonance or shows that the concept is generally applicable? Human inventions can be misaligned with human values. The more powerful the invention, the more damage it can do if it is misaligned. The corporation is a powerful invention. Super intelligence is the most powerful invention imaginable.