| > already succeeded with 99% of the alignment task and only failed at the last 1% of making it care in a way we'd prefer. Who is we? Humanity does not think with one unified head. I'm talking about a scenario where someone makes the AI which serves their goals, but in doing so harms others. AGI won't just happen on its own. Someone builds it. That someone has some goals in mind (they want to be rich, they want to protect themselves from their enemies, whatever). They will fiddle with it until they think the AGI shares those goals. If they think they didn't manage to do it they will strangle the AGI in its cradle and retry. This can go terribly wrong and kill us all (x-risk). Or it can succeed where the people making the AGI aligned it with their goals. The jump you are making is to assume that if the people making the AGI aligned it with their goals that AGI will also align with all of humanity's goals. I don't see why that would be the case. You are saying that doing one is 99% of the work and the rest is 1%. Why do you think so? > Of course we can always imagine some "movie plot" scenarios that happen to get some low-probability outcome by mere chance. Definitions are not based on probabilities. sanxiyn wrote "AI is safe if it does not cause extinction of humanity." To show my disagreement I described a scenairo where the condition is true (that is the AI does not cause extinction of humanity), but I would not describe as "safe AI". I do not have to show that this scenario is likely to show the issue with the statement. Merely that it is possible. > focusing one's worry on winning an anti-lottery rather than allocating resources to the more common failure modes. You state that one is more common without arguing why. Stuff which "plainly doesn't work and harmful for everybody" is discontinued. Stuff which "kinda works and makes the owners/creators happy but has side effects on others" is the norm, not the exception. Just think of the currently existing superinteligences: corporations. They make their owners fabulously rich and well protected, while they corrupt and endanger the society around them in various ways. Just look at all the wealth oil companies accumulated for a few while unintentionally geo-engineering the planet and systematically suppressing knowledge about climate change. That's not a movie plot. That's the reality you live in. Why do you think AGI will be different? |
(Different person)
I think it's much starker than that, more even than 99.99% to 0.01%; the reason is the curse of high dimensionality.
If you imagine a circle, there's a lot of ways to point an arrow that's more than 1.8° away from the x-axis.
If you imagine a sphere, there's even more ways to point an arrow that's more than 1.8° away from the x-axis.
It gets worse the more dimensions you have, and there's a lot more than two axies of human values; even at a very basic level I can go "oxygen, food, light, heat", and that's living at the level of a battery farmed chicken.
Right now, we don't really know how to specify goals for a super-human optimiser well enough to even be sure we'd get all four of those things.
Some future Stalin or future Jim Jones might try to make an AGI, "strangle the AGI in its cradle and retry" because they notice it's got one or more of those four wrong, and then finally release an AI that just doesn't care at all about the level of Bis(trifluoromethyl)peroxide in the air, and this future villain don't even know that this is bad for the same reason I just got that name from the Wikipedia "List of highly toxic gases" (because it is not common knowledge): https://en.wikipedia.org/wiki/List_of_highly_toxic_gases