Hacker News new | ask | show | jobs
by sanxiyn 715 days ago
AI is safe if it does not cause extinction of humanity. Then it is self-evident why it is important.

The article does link to "Statement on AI Risk", at https://www.safe.ai/work/statement-on-ai-risk

It is very short, so here is full quote.

> Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.

2 comments

> AI is safe if it does not cause extinction of humanity.

I don't think that is true. "AI is not safe if it cause extinction of humanity." is more likely to be true. But that is a necessary requirement but not sufficient.

Just think of a counter example: An AI system which wages war on humanity, wins and then keeps a stable breeding population of humans in abject suffering in a zoo like exhibit. This hypothetical AI did not cause extinction of humanity. Would you consider it safe? I would not.

That's called "s-risk" (suffering risk). Some people in the space do indeed take it much more seriously than "x-risk" (extinction risk).

If you are deeply morally concerned about this, and consider it likely, then you might want to consider getting to work on building an AI which merely causes extinction, ASAP, before we reinvent that one sci-fi novel.

Personally, I see no particular reason to think this is a very likely outcome. The AI probably doesn't hate us - we're just made out of joules it can use better elsewhere. x-risk seems much more justified to me as a concern.

> The AI probably doesn't hate us

The AI doesn't have to hate us for this outcome. In fact it might be done to cocoon and "protect" us. It just has different idea from us what needs to be protected and how. Or alternatively it can serve (perfectly or in a faulty way) the aims of its masters. A few lords reigning over suffering masses.

> If you are deeply morally concerned about this, and consider it likely, then you might want to consider getting to work on building an AI which merely causes extinction, ASAP, before we reinvent that one sci-fi novel.

What a weird response. Like one can't be concerned about two ( (or more!) things simultaneously? Talk about "Cutting off one's nose to spite one's face"

The quote I've heard is: 'The AI does not hate you, nor does it love you, but you are made of atoms which it can use for something else': https://www.amazon.de/-/en/Tom-Chivers/dp/1474608787 (another book I've not read).

> Or alternatively it can serve (perfectly or in a faulty way) the aims of its masters.

Our state of knowledge is so bad that being able to do that would be an improvement.

The argument is that "humans live, but suffer" is a smaller outcome domain and thus less likely to be hit than an outcome incompatible with human life. Because at that point, getting something to care about humans at all, you've already succeeded with 99% of the alignment task and only failed at the last 1% of making it care in a way we'd prefer. If it were obvious that rough alignment is easy but the last few bits of precision or accuracy are hard that'd be different.

I fail to see a broad set of paths that end up with a totally unaligned AGIs and yet humans live but in a miserable state.

Of course we can always imagine some "movie plot" scenarios that happen to get some low-probability outcome by mere chance. But that's focusing one's worry on winning an anti-lottery rather than allocating resources to the more common failure modes.

> already succeeded with 99% of the alignment task and only failed at the last 1% of making it care in a way we'd prefer.

Who is we? Humanity does not think with one unified head. I'm talking about a scenario where someone makes the AI which serves their goals, but in doing so harms others.

AGI won't just happen on its own. Someone builds it. That someone has some goals in mind (they want to be rich, they want to protect themselves from their enemies, whatever). They will fiddle with it until they think the AGI shares those goals. If they think they didn't manage to do it they will strangle the AGI in its cradle and retry. This can go terribly wrong and kill us all (x-risk). Or it can succeed where the people making the AGI aligned it with their goals. The jump you are making is to assume that if the people making the AGI aligned it with their goals that AGI will also align with all of humanity's goals. I don't see why that would be the case.

You are saying that doing one is 99% of the work and the rest is 1%. Why do you think so?

> Of course we can always imagine some "movie plot" scenarios that happen to get some low-probability outcome by mere chance.

Definitions are not based on probabilities. sanxiyn wrote "AI is safe if it does not cause extinction of humanity." To show my disagreement I described a scenairo where the condition is true (that is the AI does not cause extinction of humanity), but I would not describe as "safe AI". I do not have to show that this scenario is likely to show the issue with the statement. Merely that it is possible.

> focusing one's worry on winning an anti-lottery rather than allocating resources to the more common failure modes.

You state that one is more common without arguing why. Stuff which "plainly doesn't work and harmful for everybody" is discontinued. Stuff which "kinda works and makes the owners/creators happy but has side effects on others" is the norm, not the exception.

Just think of the currently existing superinteligences: corporations. They make their owners fabulously rich and well protected, while they corrupt and endanger the society around them in various ways. Just look at all the wealth oil companies accumulated for a few while unintentionally geo-engineering the planet and systematically suppressing knowledge about climate change. That's not a movie plot. That's the reality you live in. Why do you think AGI will be different?

> You are saying that doing one is 99% of the work and the rest is 1%. Why do you think so?

(Different person)

I think it's much starker than that, more even than 99.99% to 0.01%; the reason is the curse of high dimensionality.

If you imagine a circle, there's a lot of ways to point an arrow that's more than 1.8° away from the x-axis.

If you imagine a sphere, there's even more ways to point an arrow that's more than 1.8° away from the x-axis.

It gets worse the more dimensions you have, and there's a lot more than two axies of human values; even at a very basic level I can go "oxygen, food, light, heat", and that's living at the level of a battery farmed chicken.

Right now, we don't really know how to specify goals for a super-human optimiser well enough to even be sure we'd get all four of those things.

Some future Stalin or future Jim Jones might try to make an AGI, "strangle the AGI in its cradle and retry" because they notice it's got one or more of those four wrong, and then finally release an AI that just doesn't care at all about the level of Bis(trifluoromethyl)peroxide in the air, and this future villain don't even know that this is bad for the same reason I just got that name from the Wikipedia "List of highly toxic gases" (because it is not common knowledge): https://en.wikipedia.org/wiki/List_of_highly_toxic_gases

> This can go terribly wrong and kill us all (x-risk). Or it can succeed where the people making the AGI aligned it with their goals. The jump you are making is to assume that if the people making the AGI aligned it with their goals that AGI will also align with all of humanity's goals.

Sure, but for s-risk-caused-by-human-intent scenario to become an issue the x-risk problem has to be solved or negligible.

If we had the technology to capture all of a human's values properly so that their outcomes are still be acceptable when executed and extrapolated by an AGI then applying the capture process to more than one human seems more like a political problem than one of feasibility.

> You are saying that doing one is 99% of the work and the rest is 1%. Why do you think so?

Because I'm not seeing a machine-readable representation of any human's values. Even a slice of any human's values anywhere. When we specify goals for reinforcement learning they're crude, simple proxy metrics and things go off the rails when you maximize them too hard. And by default machine minds should be assumed to be very alien minds, humans aren't occupying most of the domain space. Evolved antennas are a commonly cited toy example of things that humans wouldn't come up with.

> Definitions are not based on probabilities. sanxiyn wrote "AI is safe if it does not cause extinction of humanity."

It's a simplification crammed into a handful of words. Not sure what level of precision you were expecting? Perhaps a robust, checkable specification that will hold up to extreme scrutiny and potentially hostile interpretation? It would be great to have one of those. Perhaps we could then use it for training.

> Just think of the currently existing superinteligences: corporations.

They're superorganisms, not superintelligences. Even if we assume for the moment that the aggregate is somewhat more intelligent than an individual, I would still say that almost all of their power comes from having more resources at their disposal than individuals rather than being more intelligent.

And they're also slow, internally disorganized and their individual constituents (humans) can pursue their own agendas (a bit like cancer). They lack the unity of will and high-bandwidth communication between their parts that'd I'd expect from a real superintelligence.

And even as unaligned optimizers you still have to consider that they depend on humans not being extinct. You can't make profit without a market. That is like a superintelligence that has not yet achieved independence and therefore would not openly pursue whatever its real goals are and instead act in whatever way is necessary to not be shut down by humans. That's the self-preservation part of instrumental convergence.

> You state that one is more common without arguing why. Stuff which "plainly doesn't work and harmful for everybody" is discontinued. Stuff which "kinda works and makes the owners/creators happy but has side effects on others" is the norm, not the exception.

A superintelligence wouldn't be dumb. So game theory, deception and perhaps having a planning horizon that's longer than a rabid mountain lion's should be within its capabilities. That means "kinda works" is not the same as "selected for being compatible with human existence".

or it could be a elaborate ruse to keep power very concentrated.