Hacker News new | ask | show | jobs
by simonw 979 days ago
That's why I always emphasize that prompt injection isn't an attack against LLMs themselves: its a class of attacks against applications we build on top of LLMs that work by concatenating together trusted and untrusted prompts.
2 comments

Isn't that just shifting the user's misunderstanding to whoever is developing the application?

I guess my argument is that if the type of behaviour described in the article causes problems, perhaps the technology was chosen incorrectly.

Edit: Or maybe I just have a problem with the vocabulary. Obviously, it's useful information.

It's a bit weird that they can't even avoid this when it comes to images; GPT shouldn't really be obeying instructions from images at all! I wonder if it's just OCRing images and concatenating that into the prompt...
It's much more sophisticated than just OCR. The model was trained on images and text at the same time - it isn't processing images in a separate step.

The GPT-4 paper has a bunch more about this.

Not really, I suppose; it's just a different type of prompt. The algorithm does not "know" what it is fed. Data is data.