| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by creatonez 297 days ago

> 2. We have a critic LLM that assesses among other things whether the website content is leading a non-aligned initiative. This is still subject to the LLM intelligence, but it's a first step.

> [...]

> 4. These attacks are starting to resemble social engineering attacks. There may be opportunities to shift some of the preventative approaches to the LLM world.

With current tech, if you get to the point where these mitigations are the last line of defense, you've entered the zone of security theater. These browser agents simply cannot be trusted. The best assumption you can make is they will do a mixture of random actions and evil actions. Everything downstream of it must be hardened to withstand both random & evil actions, and I really think marketing material should be honest about this reality.

1 comments

antves 297 days ago

I agree, these mitigations alone can't be sufficient, but they are all necessary within a wider framework.

The only way to make this kind of agents safe is to work on every layer. Part of it is teaching the underlying model to see the dangers, part of it is building stronger critics, and part of it is hardening the systems they connect to. These aren’t alternatives, we need all of them.

link