|
I disagree. His "followers" (as you say) are in general just as cautious as Yudkowsky w.r.t. unfriendly AI. At the time of the original experiments, the dispute was over the question of "could we keep an unfriendly AI in a box," not "Is it worth risking setting an unfriendly AI loose?" His "followers" know how to do an expected utility calculation. If it was utilitarian concerns that allowed Yudkowsky to convince the gatekeepers to let the AI loose, he would have had to convince them that the following inequality holds even when you don't know the probability that the AI is and will remain aligned with human values: [P(AI.friendly? == True) * Utility(Friendly_AI) + (1 - P(AI.friendly? == True)) * Utility(End_of_Human_Race)] > Utility(World continues on as usual) Given that Yudkowsky has gone to considerable lengths (The Sequences, LessWrong, HPMOR, SIAI/MIRI...) to convince people that this inequality does NOT hold (until you can provably get P(AI.friendly? == True) to 1, or damn close), it's probably safe to assume that he used a different strategy. Keep in mind that Utility(End_of_Human_Race) evaluates to (roughly) negative infinity. And btw, I'm pretty sure the rules say you have to look at the AI's output window throughout the length of the experiment. Either way, the point of the exercise is to be a simulation, not to prove that you can be away from your desk for 20 minutes while Eliezer talks to a wall. In the simulation, you really don't know if it's friendly or what its capabilities are. Someone will have to interact with it eventually. Otherwise, what's the point of building the AI in the first place? The simulation is to show that through the course of those basic interactions, humans are not infallible and eventually, even if it's not you, someone will let it out of the box. |
The AI is allowed to lie though, so do you not think he's capable of a false argument which "proves" the opposite in specific circumstances, especially when hammered home with enough emotional manipulation?
But then the person knows that the AI is lying to them. This is why I think it must be a trick: the whole thing seems so simple. The AI is lying, so you just ignore all its arguments and keep saying "no." This is why I keep referring to his followers somewhat dismissively: the only possible reason I can see is that their worldview requires them to engage seriously and fairly with every idea they come across. Most people are not burdened with this.
I really wish I knew how he did it.