Hacker News new | ask | show | jobs
by joe_the_user 2 hours ago
Even your discussion makes it "sanitized input" simply doesn't exist in relation to an LLM. At best it seems like one can prefix and filter input as much as possible, monitor the results but never assume that you are done.
1 comments

If that's the case then user-facing products that can take any useful action are strictly off the table.
I'll play advocatus diaboli for once here.

Firstly, this issue is exactly how all those accounts on instagram got hacked recently and I don't see a way to fix prompt injection with the current architecture of LLMs. I strongly suspect it is entirely impossible to achieve.

But, that doesn't mean that all useful actions are forbidden. The important part is identifying maximum and minimum harms. I lean towards LLMs for simple NLP tasks like detecting obvious spam, because even if it is completely wrong the worst case is that a spam message gets through or a valid one gets sent to spam - two issues we already routinely deal with anyway.