| I think you're rather fixated on a certain conception of "rationality" which is more like Mr. Spock than like what Yudkowsky uses it to mean. The Yudkowskyian definition of rationality is that which wins, for the relevant definition of "win". Specifically, if there is some clever argument that makes perfect sense that tells you to destroy the world, you still shouldn't destroy the world immediately, if the world existing is something you value. It's a meta-level up: you being unable to think of a counter argument isn't proof, and the destruction of the world isn't something to gamble with. Yes, Yudkowsky likes thought experiments dealing with the edge cases. Yes, 3^^^^^3 grains of sand is a thought experiment that produces conflicting intuitions. Yes, the edge cases need to be explored. But in a life or death situation (and the destruction of the world qualifies as this 7 billion times over), you don't make your decisions on the basis of trippy thought experiments. (Especially novel ones you've just been presented with. And ones that have been presented by an agent which has good reasons to try to trick you.) So, no. Again, a "logical-linguistic trick" might work on Mr. Spock, but we're not talking about Mr. Spock here. > He's evidently a very charismatic and persuasive guy Exactly. That's the point. If even a normal charismatic and persuasive guy can convince people to let him out, superintelligent AI would have an even easier time at it. Long story short, it dosn't matter how he did it. All that matters is that it can be done. It can be done even by a "mere" human. If he can do it, a superintelligence with all of humanity's collected knowledge of psychology and cognitive science could do it to, and likely in a fraction of the time. |
However, let me be clear: how he did it is the only thing I care about. I am not convinced that the threat of superintelligence merits our resources compared to other concrete problems. To me the experiment is not meaninguflly different to stories of the temptation of christ in the desert. Except more fun than that story, because yudowsky is a more interesting character than satan.
EDIT: if rationality is about winning, what could be simpler than a game where you just keep repeating the same word in order to win? It seems like almost the base-case for rationality, if one accepts that definition.
I would submit that an unstated definition of rationality is "dealing with difficult, complex situations in ones life algorithmically" ie. most of HPMOR, the large amounts of self-help stuff on LR. Someone who had internalized this stuff would be more vulnerable than the average population to "spock-style bullshit", to reuse that unfortunate phrase.