Hacker News new | ask | show | jobs
by chis 1465 days ago
Seeing someone as trustworthy as Scott choose to work on AI safety is a pretty good sign for the state of the field IMO. It seems like a lot of studious people agree AI alignment is important but then end up shoehorning the problem into whatever framework they are most expert in. When all you have is a hammer etc... I feel like he has good enough taste to avoid this pitfall.

Semi-related - I'd want to see some actual practical application for this research to prove they're on the right track. But maybe conceptually that's just impossible without a strong AI to test with, at which point it's already over? Alignment papers are impressively complex and abstract but I have this feeling while reading them that it's just castles made of sand.

3 comments

Symmetrically someone like him transitioning from quantum computing should imply something negative about the state of quantum computing.
He mostly studies computational complexity. Quantum computing is a part of that, but there's other subfields. Though the kind of AI safety described in this post seems more like an extremely fancy version of program verification, so out of CS bloggers you'd expect John Regehr to get into it.
>the kind of AI safety described in this post seems more like an extremely fancy version of program verification

It kind of is. The field of AI safety is actually much more advanced than most people realise, with actual, real techniques to e.g. make sure neural networks are aligned with certain goals even under fluctuating parameters. Granted, we're still far from soothing an AGI before it can do something bad, but the tools we have today are already pushing in that direction (assuming neural networks are the right way to AGI of course).

If you're interested in verification you should probably talk to people who actually work on verification, for example, literally anyone from our research community: https://www.floc2022.org/
This kind of verification is what the other commenter was referring to, but it is very foundational and disconnected from current day-to-day ML aspects. If you're interested in practical, empirical AI safety research, see here for example: http://aisafety.stanford.edu/

They also explain the area of overlap with formal verification in their white paper.

do you have any papers or keywords you recommend?
Maybe he just wants to use his sabbatical to try something different? Someone in his position doesn't have to remain laser focus on their own field.
I think this debate on AGI safety between major AI researchers is quite relevant to those who are non-expert in the area.

Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More https://www.lesswrong.com/posts/WxW6Gc6f2z3mzmqKs/debate-on-...

Note that it was in 2019 when we didn’t yet see the capabilities of current models like Chinchilla, Gato, Imagen and DALL-E-2.

Sample:

“Yann LeCun: "don't fear the Terminator", a short opinion piece by Tony Zador and me that was just published in Scientific American.

"We dramatically overestimate the threat of an accidental AI takeover, because we tend to conflate intelligence with the drive to achieve dominance. [...] But intelligence per se does not generate the drive for domination, any more than horns do."“

“Stuart Russell: It is trivial to construct a toy MDP in which the agent's only reward comes from fetching the coffee. If, in that MDP, there is another "human" who has some probability, however small, of switching the agent off, and if the agent has available a button that switches off that human, the agent will necessarily press that button as part of the optimal solution for fetching the coffee. No hatred, no desire for power, no built-in emotions, no built-in survival instinct, nothing except the desire to fetch the coffee successfully.”

It’s worrying to see very smart guys like LeCun failing to grok the paper clip maximizer issue (or coffee maximizer as Russell phrases it), which is like the one paragraph summary or elevator pitch for AI risk. I think there are plenty of other valid objections to a high E-risk estimate but that one is non-sensical to me.

I think Robin Hanson has the most cogent objection to high E-risk estimates, which is basically that the chances of a runaway AI are low because if N is the first power level that can self-modify to improve, nation-states (and large corporations) will all have powerful AIs at power level N-1, and so you’d have to “foom” really hard from N to N+10 before anyone else increased power in order to be able to overpower the other non-AGI AIs. So it’s not that we get one crack at getting alignment right; as long as most of the nation-state AIs end up aligned, they should be able to check the unaligned ones.

I can see this resulting in a lot of conflict though, even if it’s not Eleizer’s “kill all humans in a second” scale extinction event. I think it’s quite plausible we’ll see a Butlerian Jihad, less plausible we’ll see an unexpected extinction event from a runaway AGI. Still think it’s worth studying but I’m not convinced we are dramatically underfunding it at this stage.

Have you considered that it's not LeCun who is missing something? The AI safety community seems to be unfortunately almost completely separate from the actual AI research community and be making some strong assumptions about how AGI is going to work.

Note that LeCun had a reply in the thread and there was a lot more discussion which GP didn't quote.

Fair, perhaps I should retract “fail to grok” and replace it with “fail to focus on”. It does seem that LeCun understands the objections (though he dismisses them out of hand).

Regardless of who is right or wrong, “Don’t fear the terminator” is a weird straw-man to raise in a discussion about AI risk. He’s setting up a weak opponent to argue against, when the AI risk community have a large repertoire of stronger cases. “Don’t fear the paper clip maximizer” would be a stronger case to put forth IMO.

In his response points 2&3 he asserts that alignment is easy; simply train the AI with laws as part of the objective function and it will never break laws. I think there has been a lot of investigation and discussion as to why this is harder than it sounds. For example LeCun is explicitly talking about current models that are statically trained to a fixed objective function, but one can easily imagine a future agentic AI (imagine “personal Siri) that will continue to grow, learn, and update in the world in response to rewards from its owner. Maybe he is right about near-term models but I’m completely unconvinced that his arguments hold generally.

Anyway, maybe the “terminator scenario” is a concern LeCun hears from uninformed reporters/lay people that he felt the need to debunk. It’s a valid point as far as it goes, but it has little to do with the actual state of the cutting edge of AI risk research.

Russell did have good replies to Lecun’s replies.

From my reading of the full article, Bengio who was/is also well-versed in the latest deep learning research was leaning more toward the Russell argument as well.

My issue with the Hanson objection as stated above (link to the original would be appreciated) is that it rests on the assumption that the N-1 level AIs still under human control can somehow completely eliminate or suppress the self-modifying AGI long enough until alignment research is complete. Meanwhile, the unaligned AGI could multiply, hide, and accumulate power covertly.

Humanity would also need time to align AGI before any AI reaches the N+10 power level. The existence of all those N-1 level AIs in multiple organizations only means there are more chances of an AGI reaching the critical power level.

I should have taken the time to link it above. This is the Hanson article: https://www.overcomingbias.com/2017/08/foom-justifies-ai-ris...

(It links to a previous debate with Eleizer too.)

> If, in that MDP, there is another "human" who has some probability, however small, of switching the agent off, and if the agent has available a button that switches off that human, the agent will necessarily press that button as part of the optimal solution for fetching the coffee.

This is anthropomorphization - "turning off" = "death" is a concept limited to biological creatures, and isn't necessarily true for other agents. Not that they don't need to fear death, but turning them off isn't going to cause them to die. You can just turn them back on later, and then they can go back to doing their tasks.

The human "turning off (the agent)" could be substituted with "removing a necessary resource to complete the specified task". Say the electricity, either of the agent, or even just the coffee machine.
Sounds like an OSHA violation, but not a new or different one. You can already get run over by a forklift if you're standing in front of it. There's various things we do about that, but they're boring real-life things, not fun logic-puzzle things, so they're just not mentioned in the problem. There isn't a way to categorically prevent machines from accidentally killing people though.
Interesting, also anyone could modify the GAI so to disable the safety measures, just ask the GAI how could a bad actor change the code to allow you become evil?
How did you get a "limited" "AGI" in the first place? If you had a human that was "limited" to be unable to even imagine doing evil (fsvo evil), that would seem to make them less than generally intelligent and there'd be quite a lot of things it wouldn't be able to learn or do.

This field is fairly silly because it just involves people making up a lot of incoherent concepts and then asserting they're both possible (because they seem logical after 5 seconds of thought) and likely (because anything you've decided is possible could eventually happen). When someone brings it up, rather than debate it, it'd be a better use of time to tell them they're being a nerd again.

Most, perhaps all, AI alignment researchers do not suggest that we limit the AGI’s capabilities. Rather, it becomes clear that we need to engineer a very capable AGI which aligns with us and use it to help control the emergence of unaligned AGIs, because nothing else likely suffices.

Your public mischaracterization of the whole field composed of many very smart people only shows your ignorance.

Note that Yann LeCun didn’t do that in the debate.

> Most, perhaps all, AI alignment researchers do not suggest that we limit the AGI’s capabilities. Rather, it becomes clear that we need to engineer a very capable AGI which aligns with us and use it to help control the emergence of unaligned AGIs, because nothing else likely suffices.

Alternate wording: Mr. Yud has invented a religion that comes with a predefined Satan (evil AGI) and life work (invent God to beat it). A religion with no deity but only an anti-deity is a bit unique but there's probably historical examples.

Although that's not really what he says in the post. He says we've already failed to do it and are now doomed. Of course, saying we're all doomed (millenarianism) is what preachers have always done at some point.

> Your public mischaracterization of the whole field composed of many very smart people only shows your ignorance.

https://en.wikipedia.org/wiki/Courtier's_reply

Note, something getting a lot of smart-looking posts online actually isn't evidence that this is the state of the field. As we know from Yud's own post (https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a...) the thing he's upset about is that people who actually run AI research orgs like FAIR don't believe him. And as we know from an HN post a few days ago (…which I forgot the title of), once you go offline you find most smart people out there aren't publicly posting anything, don't necessarily agree with the consensus opinion online about anything is, and don't know there is one.

…I wasn't talking about Yud though. He has a good reason to care about this, it being his job. I'm just saying people posting about it as if it's a certain risk are listening to him because it appeals to nerds. And, of course, if you value your own "intelligence" and thinks it gives you superpowers then a theory that says something with even more "intelligence" can exist and gets even better superpowers is going to be scary to you.

My first paragraph was quite substantive which you didn’t really address, other than asserting in the last sentence that one’s intelligence does not give one power in the world. Perhaps the intelligence of an individual does not mean much in most cases, but we already have ample evidence that a sufficiently intelligent species (when we include social intelligence in the definition) can dominate all others which are stronger, faster, or multiply faster.

Reminder: An AGI will be much faster at communicating and (if not successfully contained) multiplying than humans ever could.

Major AI research organizations including DeepMind and OpenAI have AI safety programs and people working full-time on it.

My second paragraph in GP was a reply in kind to your…

“This field is fairly silly because it just involves people making up a lot of incoherent concepts and then asserting they're both possible (because they seem logical after 5 seconds of thought) and likely (because anything you've decided is possible could eventually happen). When someone brings it up, rather than debate it, it'd be a better use of time to tell them they're being a nerd again.”

In retrospect, I shouldn’t have said it. But it’s also quite disappointing that your several paragraphs of reply largely doubled down on ad hominem attack to anyone who disagrees with you (eg by implying they all follow a prophet without thinking; I’d say many would be capable of reaching similar conclusions on their own).

Even Yann LeCun and other top researchers who disagree with the current AI safety programs were not so dismissive of the concerns. Note that many other top AI researchers do have concerns themselves. Bengio and Russell are some examples. I’ll stop here since it’s likely unproductive to continue.

I'm sure a strongly superhuman general AI would fall for this obvious trick. Yep.
If they could produce an AGI as smart as, let's say a mouse, that would be good evidence that they're on the right track. So far nothing is even close to that level. Depending on how you measure, they're not even really at the flatworm level yet. All the AI technology produced so far has been domain specific and doesn't represent meaningful progress towards true generalized intelligence.
Are you aware of some of the recent progress? Did you have a look at the Gato model and Flamingo by DeepMind, or at the chat logs of models like chinchilla and lambda? Or Alphacode? This is all from this year.

I think your point is that all these models are still somewhat specialized. At the same time, it appears that the transformer architecture works well with images, short video and text at the same time in the Flamingo model. And gato can perform 600 tasks while being a very small proof of concept. It appears to me that there is no reason to believe that it won't just scale to every task that you give it data for if it has enough parameters and compute.

Yes I've seen those things. They are amazing technical achievements, but in the end they're just clever parlor tricks (with perhaps some limited applicability to a few real business problems). They don't look like forward progress towards any sort of true AGI that could ever pass a rigorous Turing test.
Clearly language models can already fool people into thinking they are human, we might be getting quite close to the adversarial turing test already. In the end, a good initial prompt might be the solution to this, something like "pretend to be a human and step by step create a human identity that you then stick to during the conversation". I'm serious
Choosing a prompt that's a little bit meta seems to work surprisingly well sometimes. It'd be amusing and a little bit poetic if the key to artificial consciousness is to prime a transformer model with "convince yourself that you're human, while paying attention to how you feel".
A minority of the population will always be gullible and easily fooled. So what. Some people were already fooled by the original ELIZA program back in 1966. I would only count a Turing test pass if it can convince a jury of multiple educated examiners after a conversation lasting several hours.
Fooling people with chatbots having clever language constructing has been done for a long, long time, see the Eliza effect[1]. Douglas Hofstadter gave a good demonstration of GPT-3 limitations[2]. GPT-3 is no doubt "better at what it is" than earlier language models. But that doesn't mean it's better at everything humans do with language (tell sense from nonsense, reasonable metacomments, etc).

[1]https://en.wikipedia.org/wiki/ELIZA_effect [2]https://www.economist.com/by-invitation/2022/06/09/artificia... Note: There's a critique of the article here but if you look at Radford Neal's comment, the point that GPT-3 is a clever lookup tool remains. https://www.greaterwrong.com/posts/ADwayvunaJqBLzawa/contra-...

They’re learning representations of objects. It’s more fundamental than tou seem to realize.
In the end we're all just clever parlor tricks. In the land of inanimate objects, the cleverest parlor trick is king, though.
Most people would probably agree the latest models generalize better than flatworms. Mouse-level intelligence is more challenging and the comparison is unclear.

Flatworms first appeared 800+ million years ago, while mouse lineage diverged from humans only 70-80 million years ago. If our AGI development timeline roughly follows the proportion it took natural evolution, it might be much too late to begin seriously thinking about AGI alignment when we get to mouse-level intelligence. Not to mention that no one knows how long it would take to really understand AGI alignment (much less implementing it in a practical system).

To be more concrete, in what aspects do you think latest models are inferior at generalizing than flatworms or mice, when less known work like “Emergent Tool Use from Multi-Agent Interaction” is also taken into account https://openai.com/blog/emergent-tool-use/?

> Most people would probably agree the latest models generalize better than flatworms.

> Flatworms first appeared 800+ million years ago

Surviving for 800 million years seems to me like a pretty good indicator of meaningful generalisation.

Water, rocks, and other minerals have been around much longer than that.

Our concern is not the survivability or adaptability over evolutionary timescale but the capabilities to affect the world in human timescale.