Hacker News new | ask | show | jobs
by williamcotton 1129 days ago
You don’t need to ask the LLM where the email came from or provide the LLM with the email address. You just take the subject and the body of the email and provide that to the LLM, and then take the response from the LLM along with the unaffected email address to make the API calls…

  addTodoItem(taintedLLMtranslation, untaintedOriginalEmailAddress)
As for summaries, don’t allow that output to make API calls or be eval’d! Sure, it might be in pig latin from a prompt injection but it won’t be executing arbitrary code or even making API calls to delete Todo items.

All of the data that came from remote commands, such as the body of a newly created Todo item, should still be considered tainted and and treated in a similar manner.

These are the exact same security issues for any case of remote API calls with arbitrary execution.

1 comments

Agreed that if you focus on any specific task, there's a safe way to do it, but the challenge is to handle arbitrary natural language requests from the user. That's what the Privileged LLM in the article is for: given a user prompt and only the trusted snippets of conversation history, figure out what action should be taken and how the Quarantined LLM should be used to power the inputs to that action. I think you really need that kind of two-layer approach for the general use case of an AI assistant.
I think the two-layer approach is worthwhile if only for limiting tokens!

Here’s an example of what I mean:

https://github.com/williamcotton/transynthetical-engine#brow...

By keeping the main discourse between the user and the LLM from containing all of the generated code and instead just using that main “thread” to orchestrate instructions to write code it allows for more back-and-forth.

It’s a good technique in general!

I’m still too paranoid to execute instructions via email without a very limited set of abilities!