|
|
|
|
|
by axlprose
939 days ago
|
|
Disincentivizing it from saying mean things just strengthens it's agreeableness, and inadvertently incentivizes it to acquire social engineering skills. It's potential to cause havoc doesn't go away, it just teaches AI how to interact with us without raising suspicions, while simultaneously limiting our ability to prompt/control it. |
|