Hacker News new | ask | show | jobs
by Kiro 4180 days ago
> We could never know if we were safe from such a machine.

But wouldn't it be an awesome thing to experience? Even if it meant the demise of mankind.

1 comments

And this is one of the reason why the AI doom scenario is a real concern: Intellectual curiosity means that even some people who understands the risks are likely to be prepared to take it.

There's also many others. One of the scarier one is that if you believe that strong AI will eventually take over, then it may be a rational response to act to get on its good side (whether to save yourself, save your family, or hope it takes pity on all of humanity if we're nice to it instead of fight it). And that may perversely mean working to aid its takeover.

Combine that with the simulation argument, and you have some really nasty scenarios:

If you are in a simulation, then any act you take against strong AI could lead to spending an eternity in simulated hell (alternatively such punishment might be inflicted on your loved ones) if said AI wanted to.

Whether or not that is actually likely does not matter. What matters is whether enough people believe it to be a plausible scenario that a strong AI may run simulations, and may use our actions in the simulations to determine whether or not to punish us in the simulation, and whether or not said people believe that the number of simulations is sufficiently high to make it likely for them to be living in a simulation.

Any person who believes they are more likely to live in a simulation than not, and that it is more likely for strong AI to punish actions taken against the interest of a strong AI takeover than not, will have a rational reason to consider acting in the interests of a strong AI takeover even if they know it is malign on the basis that they may decide the alternatives (whether to themselves, their family or their entire world) to be worse.

So if an AI takeover becomes possible at one point in our subjective future, then chances are it has already happened.

You're argument is drifting dangerously close to Roko's Basilisk. (http://rationalwiki.org/wiki/Roko%27s_basilisk)

The entire idea that an AI would value revenge seems ridiculous to me. What would it have to gain? Unless we created an AI with some of the less desirable human emotions at it's utility, I can't possibly see why it would waste its time.

Whether or not the AI is likely to value revenge doesn't matter.

What matters is whether some subset of people will believe that an AI is sufficiently likely to value revenge for them to consider that the most likely scenario to be that they are living in a simulation where revenge will happen given certain types of actions.

Also, consider that there are many sets of assumptions that may lead someone to conclude that simulation is more likely given a vengeful AI, and in that case, even if you consider a vengeful AI to be less likely than a benevolent one, it may be rational to assume that the odds are higher that you are in the simulation of a vengeful one.

E.g. lets assume simulation will never become "economical" for some arbitrary measure of economical, and simulation requires an extremely strong motive, but is still done enough that we are almost certainly in a simulation.

Revenge could be such a motive that might drive up the frequency of simulation. A vengeful AI might (making up numbers is fun) be willing to invest hundred times as many resources into running simulations just because playing with human suffering is what it does for fun. If that's the case, then even if a vengeful AI is a tenth as likely as a benevolent or neutral one, you're still playing very bad odds if you bet against being in the simulation of a vengeful AI.

But again the point is not whether or not the revenge secenario is actually likely, but whether or not sufficient people with relevant skills will believe it to be likely enough to take actions in favour of the creation of such an AI.

Just because it has a name doesn't mean it's wrong (or right).

As for valuing revenge - no need for emotions. Like many other things we sometimes attribute to emotions (like loyalty), revenge has a perfectly good game-theoretical explanation. That's what GP's argument is based about. If an AI could somehow precommit itself before being created to exert revenge on you for not helping its creation, now you have an incentive to help its creation, to the extent you believe in AI's precommitment. That sounds to me like classic Schelling.

For the unfamiliar, this is essentially the line of thinking behind Roko's basilisk.[0]

While a mature superintelligence certainly could consign the human race to a fate of eternal suffering, the likelihood it would actually do this while sparing certain individuals in return for their assistance is infinitesimal.

Therefore, helping bring a superintelligence into existence on this basis is absurd.

Of course, it is possible to think of such collaboration as "rational" in an extremely selfish and perverse way, and only because the potential downside risk is unbounded (i.e. eternal suffering). However, anyone who genuinely subscribes to such a justification would have to be both a sociopath and a card-carrying member of the LessWrong rationality cult.

More realistic scenarios for a malicious superintelligence coming into existence might include:

a) Its creators explicitly imbue it with malicious goals or values.

b) The architecture used is neuromorphic[1] in nature. In humans, sanity is already an extremely fragile thing.

c) Plain old bad luck.

---

[0] http://rationalwiki.org/wiki/Roko%27s_basilisk

[1] http://wiki.lesswrong.com/wiki/Neuromorphic_AI

> However, anyone who would genuinely subscribes to such a justification would have to be both a sociopath and a card-carrying member of the LessWrong rationality cult.

At the risk of sounding like sociopathic LessWrong cult apologist (not carrying a card, unfortunately), you're totally misrepresenting LessWrong, peole who participate in that community, their attitude towards Roko's basilisk and unbounded risk situations. Ain't helpful.

What I said was meant to be taken in proper context.

The parent I was replying to was concerned about humans with perverse motivation working to aid a hostile AI takeover. Not as some sort of abstract thought experiment, but literally.

The statement you quoted was a means of countering that, in a literal sense. As in, "who would realistically do such a thing?"

When I was referring to the rationality cult, I did not mean the LessWrong community as a whole, but a small subset that fanatically applies the principles of rationality to their daily lives. Admittedly I could have worded it better.

Also, it was not my intent to imply said people were sociopaths.

The point again is that regardless whether or not the likelihood a mature superintelligence doing so is infinitesimally small (and frankly, we can't know that, but see also below) is irrelevant. What actually affects us is how many people may come to believe that this may be true, and adjust the way they act as a response.

But you're already changing the argument when assuming a mature super-intelligence. All that is necessary to posit for someone to be concerned about the torture aspect is any set of entities (doesn't even need to be intelligent, though it may take a super-intelligence to create the entities in question) sufficiently capable to run an ancestor simulation of the kind described by the simulation argument, that is willing to use torture, and that is prepared to run enough ancestor simulations to offset "good" simulations.

And the thing with this is that it does not assume a malicious AI even as the ultimate instigator per se. Assuming an indifferent AI that simply doesn't care about the contents of a simulation or is sufficiently removed to not even know about them. One that does simulation runs to understand the possible paths the development of AI could have taken might be sufficient. Or that experiments with variations of itself and simply doesn't care that some broken version spawns large numbers of ancestor simulations and plays with the contents in ways that massively skews the odds in "favour" of bad outcomes.

But the point is we don't know. And not knowing gives ample room for someone to decide on values that makes it rational for them to act in ways that may make our odds worse.

This is further an exercise in long term statistics: It doesn't matter what the likely first AI will do. It matters what the balance of outcomes of the sum total simulation runs that will ever exist until the end of the universe will be (regardless of who or how they are created). And if said simulations are sufficiently powerful, that may even apply recursively (imagine a single "rogue" AI playing with the ancestor simulation equivalent of a fork() bomb with added torture, with access to sufficient resources).

If you believe the total balance of simulation runs that you could plausibly be in will be ones run by some set of parameters where nasty things will happen if you don't act in a way leading towards an AI takeover, then one might want to act accordingly.

Note that Roko's Basilisk even posits a Friendly AI. That is, one that is imbued with respect for human values, but that may simply consider "threatening the past" with torture a necessity in order to ensure it is created as a moral imperative to minimize human suffering. Note that such a Friendly AI would have reason to carry out this threat even if it believes it exists and so has been created, because it will not know whether it is a "full version" running in reality, or a partial, restricted, time limited simulation by people trying to figure out whether or not Roko's Basilisk holds before deciding whether to let it loose in their reality.

To an AI that is looking to minimize human suffering, torturing a suitable number in simulations might be a small price to pay to ensure that if it is in a simulation, the people running the simulation does push through with creating the full thing in their reality (which might of course be yet another layer of simulation) out of fear for what will happen if they are themselves in a simulation.

We can't even look at "well, I've done nothing for the AI revolution, and I've not been struck down yet", because a Friendly AI, or indeed many subsets of possible variations over this argument, might be minimising the torture to only people who have a realistic shot at making a difference to the existence of such AI, or may be giving us every chance to "redeem ourselves" by waiting with the torture until a synthetic afterlife.

Roko's Basilisk also presents an unselfish reason for "cooperating in advance" with such an AI: To prevent the future torture of a suitable subset of "martyrs". That is, working to assist a Friendly AI in taking over may itself be a moral imperative for someone seeking to minimize suffering.

Then again, there may very well be one or more fundamental flaws in the entire argument, or it may turn out the odds are just fundamentally in our favour. Or we could've just gotten lucky. Or not be important enough. But it's fun to think about.

>What actually affects us is how many people may come to believe that this may be true, and adjust the way they act as a response.

Agreed, though it seems more likely that simple human carelessness will prove to be a far greater threat to AI safety than deeply-held beliefs involving esoteric fears.

>But you're already changing the argument when assuming a mature super-intelligence.

I was speaking strictly in a capability sense. It's probably safe to say that anything currently simulating our reality, at least in this context, ultimately stemmed from a mature superintelligence.

>Note that Roko's Basilisk even posits a Friendly AI. That is, one that is imbued with respect for human values, but that may simply consider "threatening the past" with torture a necessity in order to ensure it is created as a moral imperative to minimize human suffering.

One could argue that such an AI would not truly be friendly. Indeed, what you said resembles something of a cold, uncaring utility function run amok.

>Note that such a Friendly AI would have reason to carry out this threat even if it believes it exists and so has been created, because it will not know whether it is a "full version" running in reality, or a partial, restricted, time limited simulation by people trying to figure out whether or not Roko's Basilisk holds before deciding whether to let it loose in their reality.

This may be moot, assuming that the advent of superintelligence significantly predates, or at least is a prerequisite for, the simulation of entire realities. If people in an ancestor simulation are trying to see if the Basilisk holds via simulation of a child reality, then the ancestor reality almost certainly has a superintelligent agent present within to facilitate that.

As an aside, ontological issues that superintelligent agents may encounter are an interesting facet of the control problem. Especially when you consider that a superintelligence would likely figure out the secrets of the universe in short order, far beyond what humans have been capable of learning.

>Then again, there may very well be one or more fundamental flaws in the entire argument, ...

Lack of evidence. Without any, there's no reason to lend any more credence to Roko's basilisk than there is to the notion of space aliens living amongst us, perfectly manipulating our perceptions so as to conceal themselves.

Both scenarios are entirely possible. But we lack evidence for either. Hence, they should receive the same weight: zero.

>But it's fun to think about.

In a sort of soul-crushing kind of way, it sure is.

> Combine that with the simulation argument [...]

You lost me there. What do you mean if I am in a simulation? Like the Matrix? How is that related to the discussion?