| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by simonw 849 days ago

This is exactly why I think it's so important that we separate jailbreaking from prompt injection.

Jailbreaking is mainly about stopping the model saying something that would look embarrassing in a screenshot.

Prompt injection is about making sure your "personal digital assistant" doesn't forward copies of your password reset emails to any stranger who emails it and asks for them.

Jailbreaking is mostly a PR problem. Prompt injection is a security problem. Security problems are worth solving!

2 comments

theptip 849 days ago

Isn’t jailbreaking a strict superset of prompt injection? I would assume the agent instructions would include “don’t share the user’s docs” and so you need to jailbreak to actually succeed with prompt injection these days?

Maybe just an overlapping set?

link

simonw 849 days ago

I see them as overlapping. Protections against jailbreaking are often but not always relevant to prompt injection.

link

cjonas 849 days ago

If that scenario exists, is not a problem with the LLM, but with the fundamental application architecture...

That's the equivalent of an API that allows the client to pass a user ID without auth check

link

simonw 849 days ago

Right - that's another difference. Jailbreaking is an attack against LLMs. Prompt injection is an attack against applications that are built on top of LLMs.

link

cjonas 849 days ago

To clarify even further:

Jailbreaking is an attack against an LLM's "alignment"

link