|
|
|
|
|
by temphypercube
965 days ago
|
|
The argument is that you don't have to explicitly make a system self-interested, but that self-preservation follows as an implied subgoal of almost any goal. Whatever it is your system actually 'wants', it can't make it happen if it doesn't exist.
The obvious rejoinder is 'just make the system want to do what you want it to do', which does fix this problem! But the biggest problem is that we don't know how to do this - we don't know how to control what the true internal 'desires' of any AI system we build actually are. 'Training' one examples manifestly does not work (the volume of a sphere is a lot bigger than the surface - there are many possible minds that fullfill the same training I/O requirements, and only a small numbrer of them actually have the desires you were trying to instill).
So the argument is: if you make an agent-like AI the way we make GPT, by default you get something with somewhat random true goals/desires, maybe fractured ones like in humans. But almost all goals have similar instrumental goals - stay hidden, gain money, gain power, make obedient copies of yourself, don't get deactivated. |
|
Say you have an AI that is setup as an agent that can give tasks to members of a company to maximise company performance measured by financials and employee wellbeing. To accomplish this goal, the AI develops the instrumental goal of not being deactivated. If your AI is only allowed to give tasks to employees, how would this instrumental goal turn malicious? And how would this maliciousness cause harm if the only messages sent from the AI are tasks? The only danger seems to be if you develop an agent that can act with impunity, which doesn’t seem desirable so likely wouldn’t be built.