Hacker News new | ask | show | jobs
by lamerose 913 days ago
Is it fair to say that alignment is just the task of getting an AI to understand your intentions? It is an error to confuse the complexity of a specification of what kind of output you want, with the complexity of the process of producing that output. Getting superintelligent AI to understand simple specifications should be a non-issue. If anything, we would assume that it could be aligned using a specification of inferior quality to what a less intelligent AI would require, assuming that the superintelligent AI is better at inferring intentions.

If a little girl with no knowledge of cooking asks her dad to cook the macaroni extra crispy, his knowledge of how to do that isn't a barrier to understanding what his daughter wants. A trained chef with even greater skills might even be able to execute her order more successfully. Superalignment is nothing less mundane than this.

Advances in AI will lead to more ambitious applications. As well as requiring more intelligent technology, these new applications may well require more detailed specifications to be inputed, but these two issues are pretty orthogonal. In traditional computing, it is already clear that simple specifications often require highly complex implementations, and that some simple computational processes lead to outputs whose properties are highly difficult to specify. Why wouldn't the same apply in ML?

3 comments

> Getting superintelligent AI to understand simple specifications should be a non-issue.

Why would that be the case?

A big part of the worry around AI-alignment is exactly because this seems very hard when you try to do it. We are used to interacting with other humans, who implicitly share almost all our background assumptions when we communicate with them. The same is not the case for a computer program.

E.g. if you're holding a basketball and tell you "throw it to me", you implicitly understand that I mean to throw it:

1. To my hands, or to some area that makes it easy to catch it.

2. Throw it slowly enough that it arrives to me. Not strong enough to hurt me.

3. Not to try to bounce it off of something that will break on the way to me, even if it still arrives to me.

etc.

These are all background assumptions, and I know they're hard to actually specify because smart people have spent twenty years trying to figure out the math to do this and say it's hard.

Also, if you think those are contrived exampled - let's note that the closest thing we have to building an AGI right now is just building software, in general. And I think I won't shock anyone by saying that "getting software to do what you want, without bugs" is... hard. I think there are almost literally no software systems today that don't have bugs in them.

>I think there are almost literally no software systems today that don't have bugs in them. Programs that have been formally verified with something like Coq can be bug free. Automating formal verification may be a more effective way to solve the trust issue in this domain.
Yes, alignment is difficult in itself, but why would aligning a more advanced AI be any harder than what has already been done for current AI?
That's a good part of the problem. But it's not the whole problem.

The issue is the trained chef or dad doing "what's best" and, I don't know, using high-fiber macaroni instead of the good stuff. The higher intelligence knows best, and has its (sorry their) own agenda. Perhaps the agenda is their own, or it's a hodge podge mush of what has been trained as "good" - and that's not any better.

Beyond that is the "genie" problem - where the genie perfectly understands the request and still will find a way to mess it up.

Is your point that a more intelligent AI would develop a more entangled measure of what is good, requiring more specific alignment to be overcome; by way of analogy, are chefs harder to instruct precisely because of their prior expertise? I guess some chefs are like that, but I think it results from personality issues, not structural ones. I find describing an AI as having its own agenda to be a presumptive personification.
My point is mostly the agenda. I can see a machine having an agenda - even if that agenda is not human or not even understandable. You can call it reward function but that's giving a lot of credit to programmers - which most likely are too far removed from the agenda. Is the machine just answering questions? Well no. If it has cycles to talk to itself (or to two buddies) in the course or pursuing scientific research then perhaps this becomes the agenda (to the expense of other things). That's part of the point: IF the machine develops an agenda then what?

But "knowing best" could be a problem anyway.

And I expect that if we spend a few more minutes we can think of other ways for the situation to go "oops". Oh here is one: two humans / human entities conflicting on giving instructions. Machine soon enough "on its own".

So that I don't think "more specific alignment" can cut it - if we posit a super-human AGI with ways to act on the world. It would have to be more fundamental. Because of the issue that - at some point - one oops is not recoverable. Three laws or something? Heh.

Ok, those are some good points about what can go wrong. I still doubt that things are particularly more prone to going wrong in more intelligent systems. Wasn't it early, simplistic systems like Tay that went the furthest off the rails? The problem is that more intelligent AI will be used more ambitiously, so when it does go wrong, the consequences might be more serious than some racist twitter posts.
Right. Hedge fund going global threat? That wasn't purely a machine. But none of this needs to be purely a machine. And it sure got far before people reined it back in.

And I don't know that "more intelligent" is necessary. I can see plenty of mirth coming from an amateur or hacker / techies group or less responsible country (hah!) using whatever commercial offer to bake their own agent. What's harder? The core. Hooking up the core to a wallet, internet access, a robo-signing staff - and working around the fine print of the core vendor - that might be much easier than what OpenAI and friends are trying to do. Do they also create their own reward function and alignment in there? Yes. That's part of the fun. That's the point. Do they get it right? Maybe, maybe not.

> Is it fair to say that alignment is just the task of getting an AI to understand your intentions

To understand and not violate them. In other words, it's about aligning the values the AI uses to guide it's decision procedures, with the humans that are operating it.