| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by durch 50 days ago

The framing assumes the agent can reliably represent its principal, and I'm not convinced that holds even if you get everything else right.

The problem is that the agent itself is the attack surface. An adversary who controls the communication channel can manipulate what the agent believes about who it's talking to, which means anything it holds, its list of authorized actions, a shared secret you gave it, whatever, can be exfiltrated in ways the agent can't detect because the manipulation happens below the layer where it can reason about trust.

Open harnesses and open standards help but they don't close this gap, because the thing you need to trust, the agent's own judgment about its principal, is exactly what gets compromised. The trust chain has to go below software entirely: hardware attestation, signed commands with keys the agent can verify but never access. That's really an OS problem dressed up as an agent architecture problem.