Hacker News new | ask | show | jobs
by dijksterhuis 80 days ago
As someone who worked on “prompt injection” before it was called “prompt injection” for an (unfinished) phd…

yeah there is only one surefire 100% fix for “prompt injection”: use deterministic solutions ie not machine learning.

----

addendum in case someone tries to make this commonly made point -- i don't use deterministic here to mean "i've pinned the ML model weights after training". i use it in reference to the probability theory stuff of training/models (the boring and complicated maths stuff).

1 comments

Yeah there's no solution to prompt injection, but prompt injection in itself is not a security risk. It's about what you give the LLM access to. You can give it access to your complete DB and APIs, or you can only allow it to operate on a very specific piece of data.
> but prompt injection in itself is not a security risk

sorry, but this is wrong. the only time “prompt injection” is not a security risk is if no data is ever passed as input to a model, or the model output has no bearing on anything in the world anywhere.

in which case, why bother with the model/system in the first place.

you can exclude this security risk from your threat model. but what you’re saying there is you don’t believe it’s high likelihood that someone would want to or could run an attack, or that doing so would have sufficiently low impact/severity. possibly because you’ve put mitigations in place …

> It's about what you give the LLM access to. You can give it access to your complete DB and APIs, or you can only allow it to operate on a very specific piece of data.

aha! mitigations!

limiting blast radius and/or limiting access to model input. neither of which remove the security risk, but they do reduce the possible impact/severity and/or likelihood.

it’s all about the threat model ;)

It's only a mitigations if you already gave AI access. If you use it to for example generate some data based on research on specific data you gave it, you can feed back the AI the data to a human for human verification and then have the human have the final say. That's I think the flow we should use AI for mainly.
Model output that has seen user input is user input. User input can be dealt with securely.