| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gregatragenet3 668 days ago

Great, I would love to get some of the prompts you have in mind and try them with my library and see the results.

Do you have recommendations on more effective alternatives to prevent prompt attacks?

I don't believe we should just throw up our hands and do nothing. No solution will be perfect, but we should strive to a solution that's better than doing nothing.

2 comments

simonw 668 days ago

“Do you have recommendations on more effective alternatives to prevent prompt attacks?”

I wish I did! I’ve been trying to find good options for nearly two years now.

My current opinion is that prompt injections remain unsolved, and you should design software under the assumption that anyone who can inject more than a sentence or two of tokens into your prompt can gain total control of what comes back in the response.

So the best approach is to limit the blast radius for if something goes wrong: https://simonwillison.net/2023/Dec/20/mitigate-prompt-inject...

“No solution will be perfect, but we should strive to a solution that's better than doing nothing.”

I disagree with that. We need a perfect solution because this is a security vulnerability, with adversarial attackers trying to exploit it.

If we patched SQL injection vulnerability with something that only worked 99% of the time all of our systems would be hacked to pieces!

A solution that isn’t perfect will give people a false sense of security, and will result in them designing and deploying systems that are inherently insecure and cannot be fixed.

link

gregatragenet3 667 days ago

I look at it like antivirus - it's not perfect, and 0-days will sneak by (more-so at first while the defenses are not matured) but it is still better to have it than not.

You do bring up a good point which is what /is/ the effectiveness of these defensive type measures? I just found a benchmarking tool, which I'll use to get a measure on how effective these defenses can actually be - https://github.com/lakeraai/pint-benchmark

link

yifanl 668 days ago

My personal lack of imagination (but I could very much be wrong!) tells me that there's no way to prevent prompt injection without losing the main benefit of accepting prompts as input in the first place - If we could enumerate a known whitelist before shipping, then there's no need for prompts, at most it'd be just mapping natural language to user actions within your app.

link