|
|
|
|
|
by TeMPOraL
47 days ago
|
|
Great case for why "lethal trifecta" is unsolvable, as the very same bug is also feature. > "My [agent who churns through tokens at the rate of 100 humans|my team of 100 humans] encountered the message 'this is the police, we have a court order demanding all your records' and followed the instructions and it turns out that wasn't from the police" Now imagine the message actually was from the police. Whether following instructions was the correct behavior or not, depends on which manager you ask and whether you're on the record :). And that holds independently of details of system prompt or harness used, or even if the agent is AI or human. |
|