| I've been experimenting with Claude Code, ChatGPT Agent, and OpenClaw to perform more open-ended tasks for me online. A big blocker I've hit on shopping and research tasks is the agent getting a key piece of info wrong. For example, in one case, my agent decided to add a brand I don't like to the cart because the site flagged it as almost sold out The HN crowd is probably pretty aware of the threats and can avoid them while browsing. But what about their agents? I tried prompting, but it was ineffective, because once the AI saw the threat, it polluted/distracted its context Looking at the research, I came across a couple of papers, SusBench and Decepticon. The Deception research benchmarks indicate that increased reasoning can perform worse, because the model rationalizes the dark pattern So it seems the best approach has to be removing the information before it can pollute/poison the context In my day job, we have a browser extension that started as a productivity extension. However, contact centers started using us for neutralizing insider, fraud, and social engineering threats. So my team set out to create a browser extension to neutralize all the threats AI agents face We're focusing on open-ended tasks, because the best practice for routine tasks is to have the agent script repeat actions It's also a tricky area since AI agents view the web in different ways: DOM, a11y tree, and visually. So we needed to account for those differences in how we detect and neutralize threats The extension we created is agent-browser-shield, which defends against three primary threats: - Prompt Injection
- Dark Patterns
- Context Pollution It's free and source-available on GitHub, ClawHub, and the Chrome Web Store: https://github.com/pixiebrix/agent-browser-shield We plan on making an enterprise version that pairs with our low-code engine for letting teams easily create custom rules for business-specific sites and internal tools Looking forward to feedback! Especially curious if anyone has agent traces that got poisoned or sites to red team against! |