| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nopinsight 1466 days ago

I think this debate on AGI safety between major AI researchers is quite relevant to those who are non-expert in the area.

Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More https://www.lesswrong.com/posts/WxW6Gc6f2z3mzmqKs/debate-on-...

Note that it was in 2019 when we didn’t yet see the capabilities of current models like Chinchilla, Gato, Imagen and DALL-E-2.

Sample:

“Yann LeCun: "don't fear the Terminator", a short opinion piece by Tony Zador and me that was just published in Scientific American.

"We dramatically overestimate the threat of an accidental AI takeover, because we tend to conflate intelligence with the drive to achieve dominance. [...] But intelligence per se does not generate the drive for domination, any more than horns do."“

“Stuart Russell: It is trivial to construct a toy MDP in which the agent's only reward comes from fetching the coffee. If, in that MDP, there is another "human" who has some probability, however small, of switching the agent off, and if the agent has available a button that switches off that human, the agent will necessarily press that button as part of the optimal solution for fetching the coffee. No hatred, no desire for power, no built-in emotions, no built-in survival instinct, nothing except the desire to fetch the coffee successfully.”

3 comments

theptip 1466 days ago

It’s worrying to see very smart guys like LeCun failing to grok the paper clip maximizer issue (or coffee maximizer as Russell phrases it), which is like the one paragraph summary or elevator pitch for AI risk. I think there are plenty of other valid objections to a high E-risk estimate but that one is non-sensical to me.

I think Robin Hanson has the most cogent objection to high E-risk estimates, which is basically that the chances of a runaway AI are low because if N is the first power level that can self-modify to improve, nation-states (and large corporations) will all have powerful AIs at power level N-1, and so you’d have to “foom” really hard from N to N+10 before anyone else increased power in order to be able to overpower the other non-AGI AIs. So it’s not that we get one crack at getting alignment right; as long as most of the nation-state AIs end up aligned, they should be able to check the unaligned ones.

I can see this resulting in a lot of conflict though, even if it’s not Eleizer’s “kill all humans in a second” scale extinction event. I think it’s quite plausible we’ll see a Butlerian Jihad, less plausible we’ll see an unexpected extinction event from a runaway AGI. Still think it’s worth studying but I’m not convinced we are dramatically underfunding it at this stage.

ma2rten 1465 days ago

Have you considered that it's not LeCun who is missing something? The AI safety community seems to be unfortunately almost completely separate from the actual AI research community and be making some strong assumptions about how AGI is going to work.

Note that LeCun had a reply in the thread and there was a lot more discussion which GP didn't quote.

theptip 1465 days ago

Fair, perhaps I should retract “fail to grok” and replace it with “fail to focus on”. It does seem that LeCun understands the objections (though he dismisses them out of hand).

Regardless of who is right or wrong, “Don’t fear the terminator” is a weird straw-man to raise in a discussion about AI risk. He’s setting up a weak opponent to argue against, when the AI risk community have a large repertoire of stronger cases. “Don’t fear the paper clip maximizer” would be a stronger case to put forth IMO.

In his response points 2&3 he asserts that alignment is easy; simply train the AI with laws as part of the objective function and it will never break laws. I think there has been a lot of investigation and discussion as to why this is harder than it sounds. For example LeCun is explicitly talking about current models that are statically trained to a fixed objective function, but one can easily imagine a future agentic AI (imagine “personal Siri) that will continue to grow, learn, and update in the world in response to rewards from its owner. Maybe he is right about near-term models but I’m completely unconvinced that his arguments hold generally.

Anyway, maybe the “terminator scenario” is a concern LeCun hears from uninformed reporters/lay people that he felt the need to debunk. It’s a valid point as far as it goes, but it has little to do with the actual state of the cutting edge of AI risk research.

nopinsight 1465 days ago

Russell did have good replies to Lecun’s replies.

From my reading of the full article, Bengio who was/is also well-versed in the latest deep learning research was leaning more toward the Russell argument as well.

nopinsight 1466 days ago

My issue with the Hanson objection as stated above (link to the original would be appreciated) is that it rests on the assumption that the N-1 level AIs still under human control can somehow completely eliminate or suppress the self-modifying AGI long enough until alignment research is complete. Meanwhile, the unaligned AGI could multiply, hide, and accumulate power covertly.

Humanity would also need time to align AGI before any AI reaches the N+10 power level. The existence of all those N-1 level AIs in multiple organizations only means there are more chances of an AGI reaching the critical power level.

theptip 1465 days ago

I should have taken the time to link it above. This is the Hanson article: https://www.overcomingbias.com/2017/08/foom-justifies-ai-ris...

(It links to a previous debate with Eleizer too.)

astrange 1465 days ago

> If, in that MDP, there is another "human" who has some probability, however small, of switching the agent off, and if the agent has available a button that switches off that human, the agent will necessarily press that button as part of the optimal solution for fetching the coffee.

This is anthropomorphization - "turning off" = "death" is a concept limited to biological creatures, and isn't necessarily true for other agents. Not that they don't need to fear death, but turning them off isn't going to cause them to die. You can just turn them back on later, and then they can go back to doing their tasks.

jononor 1465 days ago

The human "turning off (the agent)" could be substituted with "removing a necessary resource to complete the specified task". Say the electricity, either of the agent, or even just the coffee machine.

astrange 1465 days ago

Sounds like an OSHA violation, but not a new or different one. You can already get run over by a forklift if you're standing in front of it. There's various things we do about that, but they're boring real-life things, not fun logic-puzzle things, so they're just not mentioned in the problem. There isn't a way to categorically prevent machines from accidentally killing people though.

newbye4 1466 days ago

Interesting, also anyone could modify the GAI so to disable the safety measures, just ask the GAI how could a bad actor change the code to allow you become evil?

astrange 1465 days ago

How did you get a "limited" "AGI" in the first place? If you had a human that was "limited" to be unable to even imagine doing evil (fsvo evil), that would seem to make them less than generally intelligent and there'd be quite a lot of things it wouldn't be able to learn or do.

This field is fairly silly because it just involves people making up a lot of incoherent concepts and then asserting they're both possible (because they seem logical after 5 seconds of thought) and likely (because anything you've decided is possible could eventually happen). When someone brings it up, rather than debate it, it'd be a better use of time to tell them they're being a nerd again.

nopinsight 1465 days ago

Most, perhaps all, AI alignment researchers do not suggest that we limit the AGI’s capabilities. Rather, it becomes clear that we need to engineer a very capable AGI which aligns with us and use it to help control the emergence of unaligned AGIs, because nothing else likely suffices.

Your public mischaracterization of the whole field composed of many very smart people only shows your ignorance.

Note that Yann LeCun didn’t do that in the debate.

astrange 1465 days ago

> Most, perhaps all, AI alignment researchers do not suggest that we limit the AGI’s capabilities. Rather, it becomes clear that we need to engineer a very capable AGI which aligns with us and use it to help control the emergence of unaligned AGIs, because nothing else likely suffices.

Alternate wording: Mr. Yud has invented a religion that comes with a predefined Satan (evil AGI) and life work (invent God to beat it). A religion with no deity but only an anti-deity is a bit unique but there's probably historical examples.

Although that's not really what he says in the post. He says we've already failed to do it and are now doomed. Of course, saying we're all doomed (millenarianism) is what preachers have always done at some point.

> Your public mischaracterization of the whole field composed of many very smart people only shows your ignorance.

https://en.wikipedia.org/wiki/Courtier's_reply

Note, something getting a lot of smart-looking posts online actually isn't evidence that this is the state of the field. As we know from Yud's own post (https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a...) the thing he's upset about is that people who actually run AI research orgs like FAIR don't believe him. And as we know from an HN post a few days ago (…which I forgot the title of), once you go offline you find most smart people out there aren't publicly posting anything, don't necessarily agree with the consensus opinion online about anything is, and don't know there is one.

…I wasn't talking about Yud though. He has a good reason to care about this, it being his job. I'm just saying people posting about it as if it's a certain risk are listening to him because it appeals to nerds. And, of course, if you value your own "intelligence" and thinks it gives you superpowers then a theory that says something with even more "intelligence" can exist and gets even better superpowers is going to be scary to you.

nopinsight 1465 days ago

My first paragraph was quite substantive which you didn’t really address, other than asserting in the last sentence that one’s intelligence does not give one power in the world. Perhaps the intelligence of an individual does not mean much in most cases, but we already have ample evidence that a sufficiently intelligent species (when we include social intelligence in the definition) can dominate all others which are stronger, faster, or multiply faster.

Reminder: An AGI will be much faster at communicating and (if not successfully contained) multiplying than humans ever could.

Major AI research organizations including DeepMind and OpenAI have AI safety programs and people working full-time on it.

My second paragraph in GP was a reply in kind to your…

“This field is fairly silly because it just involves people making up a lot of incoherent concepts and then asserting they're both possible (because they seem logical after 5 seconds of thought) and likely (because anything you've decided is possible could eventually happen). When someone brings it up, rather than debate it, it'd be a better use of time to tell them they're being a nerd again.”

In retrospect, I shouldn’t have said it. But it’s also quite disappointing that your several paragraphs of reply largely doubled down on ad hominem attack to anyone who disagrees with you (eg by implying they all follow a prophet without thinking; I’d say many would be capable of reaching similar conclusions on their own).

Even Yann LeCun and other top researchers who disagree with the current AI safety programs were not so dismissive of the concerns. Note that many other top AI researchers do have concerns themselves. Bengio and Russell are some examples. I’ll stop here since it’s likely unproductive to continue.

taneq 1464 days ago

I'm sure a strongly superhuman general AI would fall for this obvious trick. Yep.