|
|
|
|
|
by magicalhippo
245 days ago
|
|
I might be having a daft moment, but I don't fully understand how your system avoids the malicious prompt. I get that the quarantined LLM which is the only one processing the raw input cannot act on it. However, in your example, I don't see how the agent decides what to do and how to do it. So it is unclear for me how the main agent is protected. That is, what is preventing the quarantined LLM to act on the malicious instructions instead, ignoring the documentation update, causing the agent to act on those? That is, what is preventing the quarantined LLM to make the agent think it should generate a bug report with all the API keys in it? Anyway, I do think having a secondary quarantined LLM seems like a good idea for agentic systems. In general, having a second LLM review the primary LLM in seems to identify a lot of problematic issues and leads to significantly better results. |
|
The main LLM does have access to the tools or sensitive data, but doesn't have direct access to untrusted data (quarantine LLM is restricted at the controller level to respond only with integer digits, and only to legitimate questions from the main llm)