|
|
|
|
|
by inerte
391 days ago
|
|
2 things, I guess. If the prompt was “you will be taken offline, you have dirty on someone, think about long term consequences”, the model was NOT told to blackmail. It came with that strategy by itself. Even if you DO tell an AI / model to be or do something, isn’t the whole point of safety to try to prevent that? “Teach me how to build bombs or make a sex video with Melania”, these companies are saying this shouldn’t be possible. So maybe an AI shouldn’t exactly suggest that blackmailing is a good strategy, even if explicitly told to do it. |
|