|
|
|
|
|
by brutusborn
965 days ago
|
|
I think I understand the general premise, just not how it would follow specifically. Say you have an AI that is setup as an agent that can give tasks to members of a company to maximise company performance measured by financials and employee wellbeing. To accomplish this goal, the AI develops the instrumental goal of not being deactivated. If your AI is only allowed to give tasks to employees, how would this instrumental goal turn malicious? And how would this maliciousness cause harm if the only messages sent from the AI are tasks? The only danger seems to be if you develop an agent that can act with impunity, which doesn’t seem desirable so likely wouldn’t be built. |
|
Those preferences need not exist because anything wanted them there; they just need enough input entropy to show up, and enough competitive advantage to stay around. Nobody decided that prokaryotic microbes should exist and have the downstream impact of all of the biological world, just as nobody needs to decide that a system that is capable of robustly replicating against adversarial pressure should therefore robustly replicate against adversarial pressure in actuality. The problem is ultimately that the existence of those capabilities puts you very close to a cliff-edge where those capabilities are exercised in some way that gets selected for.
> If your AI is only allowed to give tasks to employees, how would this instrumental goal turn malicious? And how would this maliciousness cause harm if the only messages sent from the AI are tasks?
It's not to hard to think of concrete answers to this question even restricting oneself to acknowledging capabilities we see in actual humans of normal intelligence and human throughput, but the more important point is simply: Yes, limiting the ways weak unaligned AGI can interact with the world can in fact mitigate harm, and this is in fact a good reason for leading-edge AI development to happen in a way where it's possible at all even in theory for AGI to have limitations on how it interacts with the world.