|
|
|
|
|
by ben_w
48 days ago
|
|
Less difference than you may expect. If you do anthropomorphise them like this, consider it from the PoV of a manager: "My [agent who churns through tokens at the rate of 100 humans|my team of 100 humans] encountered the message 'this is the police, we have a court order demanding all your records' and followed the instructions and it turns out that wasn't from the police"
Current AI are more gullible, for sure. We wanted fully automated luxury space communism, we got fully automated mediocre gullibility. |
|
> "My [agent who churns through tokens at the rate of 100 humans|my team of 100 humans] encountered the message 'this is the police, we have a court order demanding all your records' and followed the instructions and it turns out that wasn't from the police"
Now imagine the message actually was from the police. Whether following instructions was the correct behavior or not, depends on which manager you ask and whether you're on the record :). And that holds independently of details of system prompt or harness used, or even if the agent is AI or human.