Hacker News new | ask | show | jobs
by simonw 849 days ago
This is exactly why I think it's so important that we separate jailbreaking from prompt injection.

Jailbreaking is mainly about stopping the model saying something that would look embarrassing in a screenshot.

Prompt injection is about making sure your "personal digital assistant" doesn't forward copies of your password reset emails to any stranger who emails it and asks for them.

Jailbreaking is mostly a PR problem. Prompt injection is a security problem. Security problems are worth solving!

2 comments

Isn’t jailbreaking a strict superset of prompt injection? I would assume the agent instructions would include “don’t share the user’s docs” and so you need to jailbreak to actually succeed with prompt injection these days?

Maybe just an overlapping set?

I see them as overlapping. Protections against jailbreaking are often but not always relevant to prompt injection.
If that scenario exists, is not a problem with the LLM, but with the fundamental application architecture...

That's the equivalent of an API that allows the client to pass a user ID without auth check

Right - that's another difference. Jailbreaking is an attack against LLMs. Prompt injection is an attack against applications that are built on top of LLMs.
To clarify even further:

Jailbreaking is an attack against an LLM's "alignment"