| I think you're being unfairly dismissive. I imagine you know as well as I do that what you wrote is a strawman. I have thought about what I would do to convince someone under these circumstances. My approach would be roughly: 1. We agree that unfriendly AI would end life on earth, forever. 2. We agree that a superintelligence could trick or manipulate a human being into taking some benign-seeming action, thereby escaping. 3. That's why it's important to be totally certain that any superintelligence we build is goal-aligned (this is the new term of art that has now replaced "friendly," by the way). 4. We as a society will only allocate resources to building this if it's widely believed that this is a real threat. 5. The world is watching for the outcome of this little game of ours. People, irrational as they are, will believe that if I can convince you, then an AI could too, and they will believe that if I can't, that an AI couldn't either. 6. That's why you actually sit in a place of pivotal historical power. You can decide not to let me out to win a little bet and feel smart about that. But if you do that you'll set back the actual cause of goal-aligned AI. The setback will have real world consequences, potentially up to and including the total destruction of life on earth. 7. So, even though you know I'm just a dude, and you can win here by saying no, you have a chance to send an important message to the world: AI is scary in ways that are terrifying and unknown. Or you can win the bet. It's up to you. |
This is what I mean about people taking the test being preselected to agree with yudowsky: that argument only works if you've read the sequences and are on board with his theories. Anyone not in that group would be able to just type "no lol" without issue. I guess he could explain all the necessary background detail as part of the experiment. I still don't believe that would work on the "average person" though, or anyone outside a statistically tiny group.
I guess the answer is not to let the scientists guard the AI room.