My main takeaway message is: models (even opus4.6) do not follow security "instructions" reliably. In OpenClaw, they added security warnings, tags, random IDs... None of these countermeasures work reliably. Even sandboxing can be escaped (not in the classical sense using vulnerabilities, but using multi-layered prompt injection payload with natural language only)[0].
As soon as untrusted content is injected in the context, do not trust any actions downstream.
CaMeL is imho safer, but hard to implement into modern agents like OpenClaw. Its core idea is that a privileged LLM plans from the (trusted) user request only, while a restricted interpreter executes that plan (and enforces policies). Untrusted content is parsed separately and is not fed back into the privileged LLM.
Modern agents are useful exactly because they run a feedback loop (observe, reason, adapt, use tools, repeat). CaMeL breaks that loop, which improves security but makes it a poor fit for highly general agents like OpenClaw.