|
|
|
|
|
by simonw
948 days ago
|
|
Honestly that's the million (billion?) dollar question at the moment. LLMs are inherently insecure, primarily because they are inherently /gullible/. They need to be gullible for them to be useful - but this means any application that exposes them to text from untrusted sources (e.g. summarize this web page) could be subverted by a malicious attacker. We've been talking about prompt injection for 14 months now and we don't yet have anything that feels close to a reliable fix. I really hope someone figures this out soon, or a lot of the stuff we want to build with LLMs won't be feasible to build in a secure way. |
|
Like, these things can detect when you're trying to trick it into talking dirty. Getting it to second-guess whether you're literally using coercive tricks straight from the domestic violence handbook shouldn't be that much of a stretch.