|
|
|
|
|
by lamerose
913 days ago
|
|
Is it fair to say that alignment is just the task of getting an AI to understand your intentions? It is an error to confuse the complexity of a specification of what kind of output you want, with the complexity of the process of producing that output. Getting superintelligent AI to understand simple specifications should be a non-issue. If anything, we would assume that it could be aligned using a specification of inferior quality to what a less intelligent AI would require, assuming that the superintelligent AI is better at inferring intentions. If a little girl with no knowledge of cooking asks her dad to cook the macaroni extra crispy, his knowledge of how to do that isn't a barrier to understanding what his daughter wants. A trained chef with even greater skills might even be able to execute her order more successfully. Superalignment is nothing less mundane than this. Advances in AI will lead to more ambitious applications. As well as requiring more intelligent technology, these new applications may well require more detailed specifications to be inputed, but these two issues are pretty orthogonal. In traditional computing, it is already clear that simple specifications often require highly complex implementations, and that some simple computational processes lead to outputs whose properties are highly difficult to specify. Why wouldn't the same apply in ML? |
|
Why would that be the case?
A big part of the worry around AI-alignment is exactly because this seems very hard when you try to do it. We are used to interacting with other humans, who implicitly share almost all our background assumptions when we communicate with them. The same is not the case for a computer program.
E.g. if you're holding a basketball and tell you "throw it to me", you implicitly understand that I mean to throw it:
1. To my hands, or to some area that makes it easy to catch it.
2. Throw it slowly enough that it arrives to me. Not strong enough to hurt me.
3. Not to try to bounce it off of something that will break on the way to me, even if it still arrives to me.
etc.
These are all background assumptions, and I know they're hard to actually specify because smart people have spent twenty years trying to figure out the math to do this and say it's hard.
Also, if you think those are contrived exampled - let's note that the closest thing we have to building an AGI right now is just building software, in general. And I think I won't shock anyone by saying that "getting software to do what you want, without bugs" is... hard. I think there are almost literally no software systems today that don't have bugs in them.