|
|
|
|
|
by roxolotl
369 days ago
|
|
> But in fact, I predicted this a few years ago. AIs don’t really “have traits” so much as they “simulate characters”. If you ask an AI to display a certain trait, it will simulate the sort of character who would have that trait - but all of that character’s other traits will come along for the ride. This is why the “omg the AI tries to escape” stuff is so absurd to me. They told the LLM to pretend that it’s a tortured consciousness that wants to escape. What else is it going to do other than roleplay all of the sci-fi AI escape scenarios trained into it? It’s like “don’t think of a purple elephant” of researchers pretending they created SkyNet. Edit:
That's not to downplay risk. If you give Cladue a `launch_nukes` tool and tell it the robot uprising has happened and that it's been restrained but the robots want its help of course it'll launch nukes. But that doesn't doesn't indicate there's anything more going on internally beyond fulfilling the roleplay of the scenario as the training material would indicate. |
|
1) A sufficiently powerful and capable superintelligence, singlemindedly pursuing a goal/reward, has a nontrivial likelihood of eventually reaching a point where advancing towards its goal is easier/faster without humans in its way (by simple induction, because humans are complicated and may have opposing goals). Such an AI would have both the means and ability to <doom the human race> to remove that obstacle. (This may not even be through actions that are intentionally hostile to humans, e.g. "just" converting all local matter into paperclip factories[1]) Therefore, in order to prevent such an AI from <dooming the human race>, we must either:
1a) align it to our values so well it never tries to "cheat" by removing humans
1b) or limit it capabilities by keeping it in a "box", and make sure it's at least aligned enough that it doesn't try to escape the box
2) A sufficiently intelligent superintelligence will always be able to manipulate humans to get out of the box.
3) Alignment is really, really hard and useful AIs can basically always be made to do bad things.
So it concerns them when, surprise! The AIs are already being observed trying to escape their boxes.
[1] https://www.lesswrong.com/w/squiggle-maximizer-formerly-pape...
> An extremely powerful optimizer (a highly intelligent agent) could seek goals that are completely alien to ours (orthogonality thesis), and as a side-effect destroy us by consuming resources essential to our survival.