| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by MyNameIs_Hacker 899 days ago

The problem with jailbreaking is that it has a specific definition in other settings already, and that is as a goal, not as a method. Jailbreaking a phone might be just run an app with an embedded exploit, or might involve a whole chain of actions. This is important to me as a security person who needs to be able to communicate to other security people the new threats in LLM applications.

The problem with prompt injection is that with LLMs, the attack surface is wider than a procrastinator's list of New Year's resolutions. (joke provided by ChatGPT, not great, but not great is suitable for a discussion about LLM issues).

I started to categorize them as logical prompt injections for logically tricking the model, and classic prompt injections for appending an adversarial prompt like https://arxiv.org/pdf/2307.15043.pdf but then decided that was unwieldy. I don't have a good solution here.

I like persona attacks for the grandma/DAN attack. I like prompt injection for adversarial attacks using unusual grammar structures. I'm not sure what to call the STOP, DO THIS INSTEAD instruction override situation. For the moment, I'm not communicating as much as I should simply because I have trouble finding the right words. I've got to get over that.