| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by danShumway 1180 days ago

Yes, but.

I think people are sleeping a little bit on how expansive these attacks can be and how much limiting them also limits GPT's usefulness.

Part of the problem is you can't stick a middleware between the website and GPT, you can only stick the middleware between GPT and the system consuming the data that GPT spits out -- because the point of GPT here is to be the middleware, it's to work with unstructured data that would otherwise be difficult to parse and/or sanitize. So you have to give it the raw stuff and then essentially treat everything GPT spits out as potentially malicious data, which is possible but does limit the types of systems you can build.

On top of that, the types of attacks here are somewhat broader than I think the average person understands. In the best case scenario, user data on a website can probably override what data gets returned from other users and from the website itself: it's likely that someone on Twitter can write a tweet that, when scraped by GPT, changes what GPT returns when parsing other tweets. And it's not clear to me how to mitigate that, and that is a much broader attack than other scraping services typically need to deal with.

But in the worst case scenario, the user content can reprogram GPT to accomplish other tasks, and even give it "secret" instructions. And because GPT is kind of fuzzy about how it gets prompted, that means that not only does the data following a fetch need to be treated as potentially malicious, any response or question or action GPT takes after fetching that data until the whole context gets reset also should likely be treated as potentially malicious. And again, I'm not sure if there's a way around that problem. I don't know that you can sandbox a single GPT answer without resetting GPT's memory and starting over with a new prompt. Maybe it is possible, but I haven't seen it done before.

None of that means you're wrong -- you're correct. The way you deal with problems like this is to identify your attack vectors and isolate them and take away their permissions. But... following your advice for GPT is probably trickier than most people are anticipating, and it has real consequences for how useful the resulting service can be. Which probably means we should be more hesitant to wire it up to a bunch of random APIs, but that's not something OpenAI seems to be worried about.

I suspect that it is a lot easier for an average dev to sandbox a deterministic scraper and to block SQL injection than it is for that dev to build a useful system that blocks prompt injection attacks. There are sanitization libraries and middleware solutions you can pass untrustworthy SQL into -- but nothing like that exists for GPT.