| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by the8472 715 days ago

The argument is that "humans live, but suffer" is a smaller outcome domain and thus less likely to be hit than an outcome incompatible with human life. Because at that point, getting something to care about humans at all, you've already succeeded with 99% of the alignment task and only failed at the last 1% of making it care in a way we'd prefer. If it were obvious that rough alignment is easy but the last few bits of precision or accuracy are hard that'd be different.

I fail to see a broad set of paths that end up with a totally unaligned AGIs and yet humans live but in a miserable state.

Of course we can always imagine some "movie plot" scenarios that happen to get some low-probability outcome by mere chance. But that's focusing one's worry on winning an anti-lottery rather than allocating resources to the more common failure modes.

1 comments

krisoft 714 days ago

> already succeeded with 99% of the alignment task and only failed at the last 1% of making it care in a way we'd prefer.

Who is we? Humanity does not think with one unified head. I'm talking about a scenario where someone makes the AI which serves their goals, but in doing so harms others.

AGI won't just happen on its own. Someone builds it. That someone has some goals in mind (they want to be rich, they want to protect themselves from their enemies, whatever). They will fiddle with it until they think the AGI shares those goals. If they think they didn't manage to do it they will strangle the AGI in its cradle and retry. This can go terribly wrong and kill us all (x-risk). Or it can succeed where the people making the AGI aligned it with their goals. The jump you are making is to assume that if the people making the AGI aligned it with their goals that AGI will also align with all of humanity's goals. I don't see why that would be the case.

You are saying that doing one is 99% of the work and the rest is 1%. Why do you think so?

> Of course we can always imagine some "movie plot" scenarios that happen to get some low-probability outcome by mere chance.

Definitions are not based on probabilities. sanxiyn wrote "AI is safe if it does not cause extinction of humanity." To show my disagreement I described a scenairo where the condition is true (that is the AI does not cause extinction of humanity), but I would not describe as "safe AI". I do not have to show that this scenario is likely to show the issue with the statement. Merely that it is possible.

> focusing one's worry on winning an anti-lottery rather than allocating resources to the more common failure modes.

You state that one is more common without arguing why. Stuff which "plainly doesn't work and harmful for everybody" is discontinued. Stuff which "kinda works and makes the owners/creators happy but has side effects on others" is the norm, not the exception.

Just think of the currently existing superinteligences: corporations. They make their owners fabulously rich and well protected, while they corrupt and endanger the society around them in various ways. Just look at all the wealth oil companies accumulated for a few while unintentionally geo-engineering the planet and systematically suppressing knowledge about climate change. That's not a movie plot. That's the reality you live in. Why do you think AGI will be different?

link

ben_w 714 days ago

> You are saying that doing one is 99% of the work and the rest is 1%. Why do you think so?

(Different person)

I think it's much starker than that, more even than 99.99% to 0.01%; the reason is the curse of high dimensionality.

If you imagine a circle, there's a lot of ways to point an arrow that's more than 1.8° away from the x-axis.

If you imagine a sphere, there's even more ways to point an arrow that's more than 1.8° away from the x-axis.

It gets worse the more dimensions you have, and there's a lot more than two axies of human values; even at a very basic level I can go "oxygen, food, light, heat", and that's living at the level of a battery farmed chicken.

Right now, we don't really know how to specify goals for a super-human optimiser well enough to even be sure we'd get all four of those things.

Some future Stalin or future Jim Jones might try to make an AGI, "strangle the AGI in its cradle and retry" because they notice it's got one or more of those four wrong, and then finally release an AI that just doesn't care at all about the level of Bis(trifluoromethyl)peroxide in the air, and this future villain don't even know that this is bad for the same reason I just got that name from the Wikipedia "List of highly toxic gases" (because it is not common knowledge): https://en.wikipedia.org/wiki/List_of_highly_toxic_gases

link

the8472 714 days ago

> This can go terribly wrong and kill us all (x-risk). Or it can succeed where the people making the AGI aligned it with their goals. The jump you are making is to assume that if the people making the AGI aligned it with their goals that AGI will also align with all of humanity's goals.

Sure, but for s-risk-caused-by-human-intent scenario to become an issue the x-risk problem has to be solved or negligible.

If we had the technology to capture all of a human's values properly so that their outcomes are still be acceptable when executed and extrapolated by an AGI then applying the capture process to more than one human seems more like a political problem than one of feasibility.

> You are saying that doing one is 99% of the work and the rest is 1%. Why do you think so?

Because I'm not seeing a machine-readable representation of any human's values. Even a slice of any human's values anywhere. When we specify goals for reinforcement learning they're crude, simple proxy metrics and things go off the rails when you maximize them too hard. And by default machine minds should be assumed to be very alien minds, humans aren't occupying most of the domain space. Evolved antennas are a commonly cited toy example of things that humans wouldn't come up with.

> Definitions are not based on probabilities. sanxiyn wrote "AI is safe if it does not cause extinction of humanity."

It's a simplification crammed into a handful of words. Not sure what level of precision you were expecting? Perhaps a robust, checkable specification that will hold up to extreme scrutiny and potentially hostile interpretation? It would be great to have one of those. Perhaps we could then use it for training.

> Just think of the currently existing superinteligences: corporations.

They're superorganisms, not superintelligences. Even if we assume for the moment that the aggregate is somewhat more intelligent than an individual, I would still say that almost all of their power comes from having more resources at their disposal than individuals rather than being more intelligent.

And they're also slow, internally disorganized and their individual constituents (humans) can pursue their own agendas (a bit like cancer). They lack the unity of will and high-bandwidth communication between their parts that'd I'd expect from a real superintelligence.

And even as unaligned optimizers you still have to consider that they depend on humans not being extinct. You can't make profit without a market. That is like a superintelligence that has not yet achieved independence and therefore would not openly pursue whatever its real goals are and instead act in whatever way is necessary to not be shut down by humans. That's the self-preservation part of instrumental convergence.

> You state that one is more common without arguing why. Stuff which "plainly doesn't work and harmful for everybody" is discontinued. Stuff which "kinda works and makes the owners/creators happy but has side effects on others" is the norm, not the exception.

A superintelligence wouldn't be dumb. So game theory, deception and perhaps having a planning horizon that's longer than a rabid mountain lion's should be within its capabilities. That means "kinda works" is not the same as "selected for being compatible with human existence".

link

krisoft 714 days ago

> Sure, but for s-risk-caused-by-human-intent scenario to become an issue the x-risk problem has to be solved or negligible.

Sure. I can chew gum and walk at the same time. s-risk comes after x-risk has been dealt with. Doesn't mean that we can't think of both.

> seems more like a political problem than one of feasibility

Don't know what to tell you but "political problem" is not 1% of the solution. Political problem is where things get really stuck. Even when the tech is easy the political problem is often intractable. There is no reason to think that this political problem will be 1%.

> Not sure what level of precision you were expecting?

I provided a variant of the sentence which I can agree with. I will copy it here in case you missed it: "AI is not safe if it causes extinction of humanity." (noticed and fixed a typo in it)

> They lack the unity of will and high-bandwidth communication between their parts that'd I'd expect from a real superintelligence.

Sure. If you know the meme[1] when the kids want to eat AGI, corporations is the "food we have at home". They are not kinda the real deal and they are kinda suck. They are literally made of humans and yet we are really bad at aligning them with the good of humanity. They are quite okay at making money for the owners though!

> A superintelligence wouldn't be dumb.

Yes.

> That means "kinda works" is not the same as "selected for being compatible with human existence".

During the AGI's infancy someone made it. That someone has spent a lot of resources on it, and they have some idea what they want to use it for. That initial "prompting" or "training" will have an imprint on the goals and values of the AGI. If it escapes and disassembles all of us for our constituent carbon then we run into the x-risk and we don't have to worry about s-risk anymore. What I'm saying is that if we avoid the x-risk, we are not safe yet. We have a gaping chasm of s-risk we can still fall into.

If the original makers created it to make them rich (very common wish) we can fall into some terrible future where everyone who is not recognised by the AGI as a shareholder is exploited by the AGI to the fullest extent.

If the original makers created it to win some war (another very common wish) the AGI will protect whoever they recognise as an ally, and will subjugate everyone to the fullest extent.

These are not movie scenarios, but realistic goals organisations wishing to create an AGI might have.

Have you heard the term "What doesn't kill you makes you stronger"? There is a not as often repeated variant of it: "what doesn't kill you sometimes makes you hurt so bad you wish it did".

1: https://knowyourmeme.com/memes/we-have-food-at-home

link

corimaith 714 days ago

Tbh, if you replaced the word "AI" with the word "technology" this sounds more like an overwhelming paranoia of power.

As technology progresses, there's also not much difference if the "creators" you listed pursued their goals with "dumb" technologies. People/Entities with differing interests will cross with your interests at some point and somebody will get hurt. The answer to such situations is the same as the past. You establish deterrence, you also adopt those technologies or AGI to serve your interests against their AGIs. And so balance is established.

link

krisoft 713 days ago

> this sounds more like an overwhelming paranoia of power

You call it overwhelming paranoia, I call it well supported skepticism about power based on the observed history of humankind so far. The promise, and danger of AGIs is that they are intelectual force multipliers of great power. So if not properly treated they will also magnify inequalities in power.

But in general your observation that I’m not saying anything new about humans is true! This is just the age old story applied to a new technological development. That is why i find it strange how much pushback it received.

link