Hacker News new | ask | show | jobs
by toozitax 3 hours ago
If the web runners return summarized results and those are still treated as untrusted, what's stopping a summary itself from carrying the injection up to the main loop?
1 comments

It's defense in depth, definitely not a silver bullet. The web runner has no access to wider capabilities outside of that page. So the only path for a prompt injection to do anything is to try and get itself included in the summary, and get the main loop to act on an instruction in that summary. That means getting pass two sets of <untrusted> tags and explicit instructions to treat everything inside as information, not instruction. Then the egress checkpoint and allow/deny whitelists are the final guards regardless of what the main loop decides to do. Trying to harden wherever I can if you have any recs.