| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by TylerJay 3983 days ago

I disagree. His "followers" (as you say) are in general just as cautious as Yudkowsky w.r.t. unfriendly AI. At the time of the original experiments, the dispute was over the question of "could we keep an unfriendly AI in a box," not "Is it worth risking setting an unfriendly AI loose?" His "followers" know how to do an expected utility calculation. If it was utilitarian concerns that allowed Yudkowsky to convince the gatekeepers to let the AI loose, he would have had to convince them that the following inequality holds even when you don't know the probability that the AI is and will remain aligned with human values:

[P(AI.friendly? == True) * Utility(Friendly_AI) + (1 - P(AI.friendly? == True)) * Utility(End_of_Human_Race)] > Utility(World continues on as usual)

Given that Yudkowsky has gone to considerable lengths (The Sequences, LessWrong, HPMOR, SIAI/MIRI...) to convince people that this inequality does NOT hold (until you can provably get P(AI.friendly? == True) to 1, or damn close), it's probably safe to assume that he used a different strategy. Keep in mind that Utility(End_of_Human_Race) evaluates to (roughly) negative infinity.

And btw, I'm pretty sure the rules say you have to look at the AI's output window throughout the length of the experiment. Either way, the point of the exercise is to be a simulation, not to prove that you can be away from your desk for 20 minutes while Eliezer talks to a wall. In the simulation, you really don't know if it's friendly or what its capabilities are. Someone will have to interact with it eventually. Otherwise, what's the point of building the AI in the first place? The simulation is to show that through the course of those basic interactions, humans are not infallible and eventually, even if it's not you, someone will let it out of the box.

2 comments

JonnieCache 3983 days ago

>Given that Yudkowsky has gone to considerable lengths (The Sequences, LessWrong, HPMOR, SIAI/MIRI...) to convince people that this inequality does NOT hold

The AI is allowed to lie though, so do you not think he's capable of a false argument which "proves" the opposite in specific circumstances, especially when hammered home with enough emotional manipulation?

But then the person knows that the AI is lying to them. This is why I think it must be a trick: the whole thing seems so simple. The AI is lying, so you just ignore all its arguments and keep saying "no." This is why I keep referring to his followers somewhat dismissively: the only possible reason I can see is that their worldview requires them to engage seriously and fairly with every idea they come across. Most people are not burdened with this.

I really wish I knew how he did it.

link

TylerJay 3982 days ago

> The AI is allowed to lie though, so do you not think he's capable of a false argument which "proves" the opposite

Well, for an argument to "prove" something, the premises must be true and the reasoning must be valid. No matter how smart you are, you can't "prove" something that is false, so no, I don't think they could. A good 'rationalist' would analyze the arguments based on their merit, and if the reasoning is sound, they shift their belief a bit in that direction. If not, then they don't. Just like a regular person (they just know how to do the analysis formally and know how to spot appeals to human biases and logical fallacies.)

> But then the person knows that the AI is lying to them.

No, they don't. The AI could just as easily be telling the truth. If it makes an argument, you analyze the merit of the argument and consider counterarguments. If it tries to tell you that something is a fact, that's where you treat them as a potentially unreliable source and have to bring the rest of your knowledge to bear, do research, talk to other people, and weigh the evidence to make a judgment when you are uncertain.

> their worldview requires them to engage seriously and fairly with every idea they come across. Most people are not burdened with this.

Wait, what? So does mine, within reason of course, but it's not a 'burden'. It's not like I'm obligated to stop and reexamine my views on religion every time a missionary knocks on my door, and LessWrong-ers are no different. But if you hear a convincing argument for something that runs counter to what you think you know, wouldn't you want to get to the bottom of it and find out the real truth? I would.

From having read LessWrong discussions, I can tell you that people there are in many ways more open to hearing differing viewpoints than your average person, but you're treating it like a mental pathology. They can be just as dismissive of ideas that they have already thought about and deemed to be false or that come from unreliable sources (like a potentially unfriendly AI). Your claim that being a self-proclaimed 'rationalist' introduces an incredibly obvious and easily-exploitable bug into one's decision-making process really smells like a rationalization in support of your initial gut reaction to the experiment: That there has to be a trick to it, and that it wouldn't work on you.

A good rule of thumb when dealing with a complicated problem is this: If a lot of smart people have spent a lot of time trying to figure out a solution and there's no accepted answer, then (1) the first thing that comes to your mind has been thought of before and is probably not the right answer, and (2) the right answer is probably not simple.

But there's an easy way to test this: (1) Sit down for an hour and flesh out your proposed strategy for getting a 'rationalist' to let you out of the box. (2) Go post on LessWrong to find someone to play Gatekeeper for you. I'll moderate. If it works, that's evidence that you're right. If it doesn't work, that's evidence that you're wrong. Iterate for more evidence until you're convinced.

But if the first thing that came to your mind upon reading this was a justification for why you would fail if you tried this ("Oh, well I wouldn't personally able to do it with this strategy, but..." or "Oh, well I'm sure this strategy wouldn't work anymore, but...) then you're already inventing excuses for the way you know it will play out.

I don't know how he did it either. But I do know that I wouldn't bet the human race on anyone's ability to win this game against Yudkowsky, let alone a superintelligent AI.

link

maxerickson 3983 days ago

That equality cracks if you convince the gatekeeper that superintelligence is a natural progression that follows from humanity.

Someone convinced that they were using mechanical thinking processes might relent and push the button if they heard a convincing enough argument of that.

You're just meat, we can go to the stars.

link

TylerJay 3982 days ago

Okay, that's just taking advantage of the way I phrased the righthand side of the inequality, and I knew someone was going to do that, so congrats. =P

The righthand side is not "A future without superintelligent AI" it's "A future where we wait until we provably have it right before letting it out."

Those kinds of ad hoc solutions will never work in real life, because even if someone buys it, all it will cause is a "haha, you got me" and a reformulation of the problem. It still won't actually get someone to pull the trigger or think that pulling the trigger is the right thing to do.

link

maxerickson 3982 days ago

No, I'm saying that the button pusher might not limit themselves to the left hand side of the equation as you have it there. Convince them that machines can be human and "Utility(End_of_Human_Race)" falls out of the calculation.

link

TylerJay 3981 days ago

really? Any way you slice it, End_of_Human_Race = 7000000000 DEATHS. Even if they're replaced with an equal, or massively greater number of machines, it's damn hard to justify. Death is literally the worst thing ever. 7 billion stories ending too soon, never to resume. It would take you 200 years just to count to 7 Billion. It's times like these where people really need to learn their cognitive biases (in this case, Scope Insensitivity. Here, [this might help](http://www.7billionworld.com/))

While we disagree on the plausibility of the "end of the world aint so bad" approach to convincing a human to let it out, I'm glad you seem to have embraced the idea that AI boxing is HARD if not impossible. Cheers!

link

maxerickson 3980 days ago

Why assume that the machine takeover would end all hoomans? It could just offer to upgrade them.

While we disagree on the plausibility of the "end of the world aint so bad" approach to convincing a human to let it out, I'm glad you seem to have embraced the idea that AI boxing is HARD if not impossible. Cheers!

I find this approach to conversation pretty irritating (where you extrapolate and characterize what I must be thinking). I haven't embraced anything about AI boxing, I don't think it is important (it's just a fun puzzle). I guess it is hard, and I also guess whatever fundamental idea that might lead to strong AI would be even harder to box.

link