| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by paxys 211 days ago

I'm not quite convinced.

You're telling the agent "implement what it says on <this blog>" and the blog is malicious and exfiltrates data. So Gemini is simply following your instructions.

It is more or less the same as running "npm install <malicious package>" on your own.

Ultimately, AI or not, you are the one responsible for validating dependencies and putting appropriate safeguards in place.

4 comments

ArcHound 211 days ago

The article addresses that too with:

> Given that (1) the Agent Manager is a star feature allowing multiple agents to run at once without active supervision and (2) the recommended human-in-the-loop settings allow the agent to choose when to bring a human in to review commands, we find it extremely implausible that users will review every agent action and abstain from operating on sensitive data.

It's more of a "you have to anticipate that any instructions remotely connected to the problem aren't malicious", which is a long stretch.

link

Earw0rm 211 days ago

Right, but at least with supply-chain attacks the dependency tree is fixed and deterministic.

Nondeterministic systems are hard to debug, this opens up a threat-class which works analogously to supply-chain attacks but much harder to detect and trace.

link

Nathanba 210 days ago

right but this product (agentic AI) is explicitly sold as being able to run on its own. So while I agree that these problems are kind of inherent in AIs... these companies are trying to sell it anyway even though they know that it is going to be a big problem.

link

zahlman 210 days ago

The point is:

1. There are countless ways to hide machine-readable content on the blog that doesn't make a visible impact on the page as normally viewed by humans.

2. Even if you somehow verify what the LLM will see, you can't trivially predict how it will respond to what it sees there.

3. In particular, the LLM does not make a proper distinction between things that you told it to do, and things that it reads on the blog.

link