Hacker News new | ask | show | jobs
by TeMPOraL 47 days ago
Great case for why "lethal trifecta" is unsolvable, as the very same bug is also feature.

> "My [agent who churns through tokens at the rate of 100 humans|my team of 100 humans] encountered the message 'this is the police, we have a court order demanding all your records' and followed the instructions and it turns out that wasn't from the police"

Now imagine the message actually was from the police. Whether following instructions was the correct behavior or not, depends on which manager you ask and whether you're on the record :). And that holds independently of details of system prompt or harness used, or even if the agent is AI or human.

1 comments

You've just reminded me of the time an actual police officer (I assume) knocked on my door and asked me about a neighbour; showed me his ID card, and I realised I had absolutely no way to know if the ID card was valid.