Hacker News new | ask | show | jobs
by robbensinger 3848 days ago
Stuart Russell (co-author of AI:MA, one of MIRI's research advisors) argues on http://edge.org/conversation/the-myth-of-ai#26015 that AI systems with "the ability to make high-quality decisions" (where "quality refers to the expected outcome utility of actions taken" and the utility function is represented in the system's programmed decision criteria) raises two problems:

"1. The utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down.

"2. Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task."

The first of those is what Bostrom calls "perverse instantiation" and Dietterich and Horvitz call the "Sorcerer's Apprentice" problem (http://cacm.acm.org/magazines/2015/10/192386-rise-of-concern...). The second of these is what Bostrom calls "convergent instrumental goals" and Omohundro calls "basic AI drives."

The first of these seems like a fairly obvious problem, if we think AI systems will ever be trusted with making important decisions. Human goals are complicated, and even a superintelligent system that can easily learn about our goals won't necessarily acquire the goals thereby. So solving the AI problem doesn't get us a solution to the goal specification problem for free.

The second of these also has some intuitive force; https://intelligence.org/?p=12234 shows Omohundro's idea can be stated formally, so it's not purely sci-fi. Averting the "Sorcerer's Apprentice" problem in full generality would mean averting this problem, since we'd then simply be able to give AI systems the right goals and let them go wild. Absent that, if AI systems become much more cognitively capable than humans, we'll probably need to actively work on some approach that violates Omohundro's assumptions (and the assumptions of the formalism above). Bostrom and MIRI both talk about a lot of interesting ideas along these lines.

1 comments

What is "an AI system with the ability to make high-quality decisions"? Do automated derivative trading models count? Do systems which decide how much to bid on a RTB ad exchange count?

The first problem is not new. We have a similar problem with some corporations, for example.

"A sufficiently capable intelligent system" is as real as "sufficiently hostile aliens". It's hard to argue and reason about a fictional system with a assortment of properties picked by someone aiming to spreading fear.

The problems aren't completely unprecedented (else we'd have basically no knowledge about them), but they become more severe in the scenarios Bostrom/Russell/etc. are talking about.

I would say that the central concern is with notional systems that can form detailed, accurate models of the world and efficiently search through the space of policies that can be expected to produce a given outcome according to the model. This can be a recommender system that tells other agents what policies to adopt, or it can execute the policies itself.

If the search process through policies is sufficiently counter-intuitive and opaque to operator inspection, the "Sorcerer's Apprentice" problem becomes much more severe than it is in ordinary software. As the system becomes more capable, it can look increasingly safe and useful in its current context and yet remain brittle in the face of changes to itself and its environment. This is also where convergent instrumental goals become more concerning, because systems with imperfectly understood/designed policy selection criteria (introducing an element of randomness, from our perspective) seem likely to converge on adversarial policies due to the general fact of resource limitations.

There's no reason to think this kind of system is inevitable, but it's worth investigating how likely we are to be able to develop superhuman planning/decision agents, on what timescale, and whether there are any actions we could take in advance to make it possible to use such systems safely. At this point not enough research-hours have gone into this topic to justify any strong conclusions about whether we can (or can't) make much progress today.

http://givewell.org/labs/causes/ai-risk gives a good summary of this topic.