Hacker News new | ask | show | jobs
by anyekwest 1158 days ago
What if we just train LLMs to remove prompt injections from inputs? I feel like this isn't an intractable problem.
2 comments

The author addressed this: why would the model built on the hallucinating technique be able to police the main hallucinator
He didn't really.
I now did in the parent comment :P
(author here) How do you know what's a prompt injection vs actual content? If you train another LLM to tell you what's a prompt injection, how do you know it has 100% coverage of all possible injections? OpenAI has been battling people trying to bypass their prompt re-write filter, and as far as I can see, not really winning, just constantly adding stuff to their blocklist until the next thing gets discovered.