Hacker News new | ask | show | jobs
by gsquaredxc 15 days ago
I have a hard time viewing prompt injection as malware. LLMs are unpredictable and there are many different prompts that can unintentionally cause unexpected behavior. It’s probably closer to a memory canary in that it tries to get malformed programs to blow up early.
3 comments

prompt injection is taught now in cyber security courses, so I think it's fair to say it's regarded as malicious
Malicious maybe, malware no. Not leaving your password as a sticky note on your work computer is presumably also taught in those same courses. I wouldn’t call someone typing in that password malware. If IT comes around and tries the password and then forces you to reset it it’s not even classified as malicious.
I suppose it's watering down the term a bit; but the term is derived from "malicious software", and this is software, and I think it's behaving maliciously.
Calling prompt injection "not malware" because LLM behavior is unpredictable is like saying a phishing email is not an attack because humans are unpredictable.

Even if maybe the mechanism of "injecting a prompt" could be beneficial in some use-cases, e.g. to instruct an LLM positively, this is case is clearly malicious by intent. The author even tried to hide it by obfuscation.

It's just an insane take by that libraries author. Even someone "on their side", that may even hate AI/LLMs more than him, would probably drop that library in a heartbeat, as the authors judgement clearly can't be trusted.

    Calling prompt injection "not malware" … is like saying a phishing email is not [malware] …
I would say phishing emails are not malware, I think most people would agree that phishing emails are not malware, and if pressed to defend this point on its own merits I would say something like “they are deceptive instructions that rely on a human executing them to do harm”. I think the “phishing” analogy supports the case for not calling it malware (it is a different, also bad thing).
They did not call phishing, but their point still stands. A phishing email is malicious, and if you see this kind of prompt injection as malicious, then I don't think it's a stretch to call software that engages in malicious prompt injectic malware
It's malware for the mind. The same way that malware tricks the CPU into doing something it wasn't supposed to do, phishing tricks humans into doing something they didn't want to do.
How do you “trick” a CPU? Malware deceives people, not a CPU.
Undefined behaviour, out of bounds memory access, memory corruption, code injection, privilege escalation...

To be precise, the CPU is doing exactly what's supposed to do, but the logic of the algorithms are subverted so that they perform in unintended ways to give leverage to a malicious actor. I hope this clarifies what I meant with this.

Does anyone remember the early 2000s joke virus emails? The ones that are variations on "This is a <outgroup> computer virus. As we don't have software engineers to write the code to do this automatically, please kindly forward this email to everyone in your address book then format your hard drive."

This is exactly as much malware as those were.

Please, for the love of all that is good, can we just try not to build and defend a world where, on encountering text like that, /your computer immediately follows the instructions/? Can we just all agree that such a world would be bad for everyone involved and using an LLM that risks doing this, with no container or guardrails, is at least as problematic as running an unpatched open email relay was back then?

It's just as bad as a CPU acting on malicious instructions. We need to create safeguards for llms too, it's just that this is not the way to do things.
> This is exactly as much malware as those were.

A joke virus email is a sign saying "please throw yourself down the stairs."

An obfuscated prompt injection that tries to delete data is someone greasing the stairs and turning off the lights.

Both rely on the environment being unsafe, but only one is deliberately trying to make the failure happen.

Lol, is a virus not malware when it crashes because someone wrote some assembly for the wrong platform?