| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cmorez 1405 days ago

> I don't understand how you can look at the current or even predicted state of the technology that we have and say "we are nowhere near the point where controlling an AGI is possible". Like....just pull the plug.

https://www.deepmind.com/blog/specification-gaming-the-flip-...

https://vkrakovna.wordpress.com/2018/04/02/specification-gam...

https://arxiv.org/abs/1803.03453 (The Surprising Creativity of Digital Evolution)

We literally don't know how to stop effective optimizing processes, deployed in non-handcrafted environments, from discovering solutions and workarounds that satisfy the letter of our instructions but not the spirit. Even for "dumb" systems, we have to rely on noticing, then post-hoc disincentivizing, unwanted behaviors, because we don't know how to robustly specify objectives.

When you train a system to, for example, "stop saying racist stuff", without actually understanding what you're doing, all you get is a system that "stops saying racist stuff" when measured by the specific standard you've deployed.

Ask any security professional how seriously people take securing a system, let alone how ineffective they are at it. Now consider the same situation but worse because almost no one takes AI safety seriously.

If you nod solemnly at the words "safety" and "reliability" but don't think anything "really bad" can happen, you will be satisfied with a solution that "works on your machine". If you aren't deeply motivated to build a safe system from the start because you can always correct things late, you are not building a safe system.

It will be possible to produce economically viable autonomous agents without robustly specified objectives.

But hey, surely a smart enough system won't even need a robustly specified objective because it knows what we mean and will always want it just as much.

Surely dangerous behavior like "empowerment" isn't an instrumental goal that effective systems will simply rediscover.

Surely the economic incentives of automation won't encourage just training bad behavior out of sight and out of mind.

Surely in the face of overwhelming profit, corporations won't ignore warning signs of recurring dangerous behavior.

Surely the only people capable of building an AGI will always be those guaranteed to prioritize alignment rather than the appearance of it.

Surely you and every single person will be taking AI safety seriously, constantly watching out for strange behavior.

Surely pulling the plug is easy, whether AI runs on millions of unsecured unmonitored devices or across hundreds of money printing server farms.

Surely an AGI can only be dangerous if it explicitly decides to fool humans rather than earnestly pursuing underspecified objectives.

Surely it's easy to build an epistemically humble AGI because epistemic humility is natural or easy enough to specify.

Surely humanity can afford to delay figuring out how best to safely handle something so potentially impactful, because we always handle these things in time.