Hacker News new | ask | show | jobs
by sureglymop 444 days ago
In this case the result/output is plain text. Since it's not code it may be harder to imagine an attack vector. As an attacker, here would be some of my capabilities/possibilities:

- I could change the meaning of the output and the output entirely. - If I can control one part of a larger set of data that is analyzed , I could influence the whole output. - I could try to make the process take forever in order to waste resources.

I'd say the first scenario is most interesting, especially if I could then potentially also influence how an LLM trained on the output behaves and do even more damage using this down the line.

Let's say I'm a disgruntled website author. I want my users to see correct information on my website but don't want any LLM to be trained on it. In this case I could probably successfully use prompt injection to "poison" the model.