| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kccqzy 298 days ago
	I think Simon has proposed breaking the lethal trifecta by having two LLMs, where the first has access to untrusted data but cannot do any actions, and the second LLM has privileges but only abstract variables from the first LLM not the content. See https://simonwillison.net/2023/Apr/25/dual-llm-pattern/ It is rather similar to your option (b).

2 comments

maximilianthe1 298 days ago

Can't the attacker then jailbreak the first LLM to generate jailbreak with actions for the second one?

link

dfabulich 298 days ago

If you read the fine article, you'll see that the approach includes a non-LLM controller managing structured communication between the Privileged LLM (allowed to perform actions) and the Quarantined LLM (only allowed to produce structured data, which is assumed to be tainted).

See also CaMeL https://simonwillison.net/2025/Apr/11/camel/ which incorporates a type system to track tainted data from the Quarantined LLM, ensuring that the Privileged LLM can't even see tainted _data_ until it's been reviewed by a human user. (But this can induce user fatigue as the user is forced to manually approve all the data that the Privileged LLM can access.)

link

yencabulator 297 days ago

"Structured data" is kind of the wrong description for what Simon proposes. JSON is structured but can smuggle a string with the attack inside it. Simon's proposal is smarter than that.

link

j45 298 days ago

One would have to be relatively invisible.

Non-deterministic security feels like a relatively new area.

link

arthurcolle 298 days ago

Yes they can

link

ares623 298 days ago

Hmm so we need 3 LLMs

link

zwnow 298 days ago