|
|
|
|
|
by recallingmemory
233 days ago
|
|
How does ChatGPT Atlas address the concerns Anthropic found? https://www.anthropic.com/news/claude-for-chrome "Prompt injection attacks can cause AIs to delete files, steal data, or make
financial transactions. This isn't speculation: we’ve run “red-teaming”
experiments to test Claude for Chrome and, without mitigations, we’ve
found some concerning results. We conducted extensive adversarial prompt injection testing, evaluating
123 test cases representing 29 different attack scenarios. Browser use
without our safety mitigations showed a 23.6% attack success rate when
deliberately targeted by malicious actors. One example of a successful attack—before our new defenses were applied—was a malicious email claiming that, for security reasons, emails needed to be deleted. When processing the inbox, Claude followed these instructions to delete the user’s emails without confirmation." |
|