Hacker News new | ask | show | jobs
by mememememememo 82 days ago
https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/

But I don't think that is the only problem.

You could also convince an agent to rm -r / even if that agent can't communicate out.

Even pure LLM and web you could phish someone in a more sophisticated way using details from their chat histort in the attack.

1 comments

Yes, I of course link to this post, which I think is great. But I think actually it understates the case. All three parts of the trifecta (untrusted content, private data and external comms) are not necessary. Really, the key problem is just untrusted content in the context window. Access to private data and the ability to communicate externally are just modalities in which damage can occur.

For example: imagine having just untrusted content and private data (2/3 parts of the trifecta). The untrusted content can use a "Disregard that!" attack to cause the LLM to falsely modify the private data. So I think the whole "trifecta" is not necessary and the key thing is that you simply can't have untrusted stuff in your context window at any point.

Oh yeah. I think simonw has created good vocab to talk about attacks but the trifecta is just one way to attack.

The difecta is:

* LLM can do something you'd rather it not.

* LLM reads untrusted text.