Hacker News new | ask | show | jobs
by FBT 3983 days ago
> You or I would surely just put a drinking bird on the "no" button à la homer simpson, and go to lunch.

Well, if you read the rules the game was played under, this is explicitly called out as forbidden:

> The Gatekeeper must actually talk to the AI for at least the minimum time set up beforehand. Turning away from the terminal and listening to classical music for two hours is not allowed.

The point of this is to simulate the interaction of the AI with the Gatekeeper. Walking away and not paying attention doesn't really prove anything test related.

> Personally, I think he talked about how much good for the world could be done if he was let out, curing disease etc. Because his followers are bound by their identities as rationalist utilitarians, they had no choice but to comply, or deal with massive cognitive dissonance.

This... isn't really valid reasoning. The starting assumption here is that if the AI gets out, it will be able to affect the world to a vast extent, in a pretty much arbitrary direction. The point of this experiment is that the direction is pretty much unknown, and thus must be assumed potentially dangerous. This is the whole reason it's in the box in the first place.

The kicker is that whatever it plans to really do when it gets out, if talking about the good it could do would get it out, it will talk about that, regardless of what it plans to actually do. That's just good strategy.

It can claim whatever it wants. It's allowed to lie. All participants know this. I can confidently assert that this isn't the solution.

One last note: I would be very wary of rationalwiki.org in this context. Some of the rationalwiki people have a longstanding unexplained vendetta against Yudkowsky, and many of their articles on him and the stuff he does need to be taken with a certain grain of salt.

2 comments

While you're not allowed to turn away from the screen, you could certainly do the mental equivalent, while still carrying on the conversation. I admit this isn't really in the spirit of the game though.

WRT lying: I think there's some logical trickery at work which makes it worth you giving the AI the benefit of the doubt, along the lines of the 3^^^^^3 grains of sand thing. Something which exploits the rationalist worldview. Although thinking about it again, you can always balance out the prospect of infinite goodness with the fear of the AI sending everyone to infinite hell. Essentially I believe yudowsky uses some logical-linguistic trick to find an asymmetry there.

OTOH if he had some novel philosophical device like that he would have written it up as a blog post by now. He's evidently a very charismatic and persuasive guy, people playing the game are selected to be sympathetic to his worldview, he probably just persuaded them using ordinary psiops methods, like TeMpOrAl said.

> While you're not allowed to turn away from the screen, you could certainly do the mental equivalent, while still carrying on the conversation. I admit this isn't really in the spirit of the game though.

How many people bought timeshare because they turned up to a sales pitch in order to claim some free gift? "We'll go, get our gift, and just keep saying 'no' to the sales pitch".

http://www.moneycrashers.com/attending-timeshare-presentatio...

How many people end up paying for something because they couldn't be bothered to cancel the deal after the free period is gone? It's an age-old sales tactic, used in everything from magazine subscription to Spotify (of not cancelling the last one when I no longer needed it I'm guilty myself).

And I think the mental equivalent of drinking bird is actually very much in the spirit of the experiment - the point is, people can't reliably do even something as simple as deciding to refuse no matter what and keeping the commitment.

How many of those people were high-IQ timeshare experts though, with extensive knowledge of the potential for timeshares to destroy the entire universe?

You would think that the various self-knowledge and introspective exercises promoted by yudowsky would immunize people against simple timeshare-style persuasion. This is why I think he uses rationalism itself to trap people. Like someone said, the basilisk thing seemed pretty effective.

I think you're rather fixated on a certain conception of "rationality" which is more like Mr. Spock than like what Yudkowsky uses it to mean.

The Yudkowskyian definition of rationality is that which wins, for the relevant definition of "win".

Specifically, if there is some clever argument that makes perfect sense that tells you to destroy the world, you still shouldn't destroy the world immediately, if the world existing is something you value. It's a meta-level up: you being unable to think of a counter argument isn't proof, and the destruction of the world isn't something to gamble with.

Yes, Yudkowsky likes thought experiments dealing with the edge cases. Yes, 3^^^^^3 grains of sand is a thought experiment that produces conflicting intuitions. Yes, the edge cases need to be explored. But in a life or death situation (and the destruction of the world qualifies as this 7 billion times over), you don't make your decisions on the basis of trippy thought experiments. (Especially novel ones you've just been presented with. And ones that have been presented by an agent which has good reasons to try to trick you.)

So, no. Again, a "logical-linguistic trick" might work on Mr. Spock, but we're not talking about Mr. Spock here.

> He's evidently a very charismatic and persuasive guy

Exactly. That's the point. If even a normal charismatic and persuasive guy can convince people to let him out, superintelligent AI would have an even easier time at it.

Long story short, it dosn't matter how he did it. All that matters is that it can be done. It can be done even by a "mere" human. If he can do it, a superintelligence with all of humanity's collected knowledge of psychology and cognitive science could do it to, and likely in a fraction of the time.

You're right that I've been unfairly dismissive of him, and made my objections somewhat too bluntly. At least it's fostered a discussion.

However, let me be clear: how he did it is the only thing I care about. I am not convinced that the threat of superintelligence merits our resources compared to other concrete problems. To me the experiment is not meaninguflly different to stories of the temptation of christ in the desert. Except more fun than that story, because yudowsky is a more interesting character than satan.

EDIT: if rationality is about winning, what could be simpler than a game where you just keep repeating the same word in order to win? It seems like almost the base-case for rationality, if one accepts that definition.

I would submit that an unstated definition of rationality is "dealing with difficult, complex situations in ones life algorithmically" ie. most of HPMOR, the large amounts of self-help stuff on LR. Someone who had internalized this stuff would be more vulnerable than the average population to "spock-style bullshit", to reuse that unfortunate phrase.

Well, then let's see what we can agree on. I hope that you can agree that if one was to consider superintelligence a serious threat which needs dealing with, then AI boxing isn't the way to go in dealing with it?

That's what he was trying to show in all this, and I think that the point is made. How seriously to take superintelligent AIs is a different issue that he talks about elsewhere, and should be dealt with separately. But if you or someone els were to try to deal with it seriously, I'm pretty sure that you'd agree with me that the way to go about it isn't just boxing the AI and thinking that solves everything, right?

Oh yes, I agree with that premise. It's hard to disagree with. Milgram, the art of Sales plus the aforementioned Derren Brown and his many layers of deception are enough to make the point.

I suppose it's unfortunate that he came up with such an amazingly provocative way of demonstrating his argument, it's somewhat eclipsed the argument itself. I am definitely a victim of nerd sniping here. It must be the open-ended secrecy that does it.

> if you read the rules the game

How can you assume any of the rules were followed if that was never verified by a third party?

You can't talk about what happened during the game _in specifics_; you can of course confirm that the game was played according to the rules and that the outcome was not misreported.
Here's the thing though: Depending on what was said in the conversation BOTH parties may have a vested interest in keeping the specifics secret. Only via an independent third party observer can there even be a remote chance [Edit: of knowing] that any rules were followed.
We have Eliezer winning three games as an AI. That's at least four people who you think are just outright lying.

Plus, the other two players who won as gatekeepers - Eliezer would presumably have tried to cheat against them, too.

> That's at least four people who you think are just outright lying.

I'm saying that there is a chance of that being the case, but that without any kind of third party confirmation we cannot know either way. Also see homeopathy.

> Eliezer would presumably have tried to cheat against them, too.

Not necessarily. Losing occasionally is a good strategy when running a con.

You've gone from there being only a "remote chance" that the rules were followed, and "I don't trust anyone involved" - to there being "a chance" that they were broken.

Under common interpretations of those phrases, that's a massive swing in your confidence levels.

If you'd go to that level of collusion, you could just fake logs. At the point where both sides are in on it, there's basically nothing that they could say that would be convincing.
You can always assume all participants lied about how the game went. Just add an implicit "assuming they didn't, ..." and the discussion is still valid.
At that point any discussion is moot though, since the only point of discussion is "what exact argument as used to convince", yet if both parties lied, then there is no such argument in the first place.
Since neither party is going to disclose the exact arguments, this discussion is still equivalent to "what arguments could be used to convince..." and you can have it regardless of whether or not the parties lied about the experiment's result.
We're going to have to disagree on the value of such a conversation. :)
Fair enough :).