| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by anyekwest 1158 days ago
	What if we just train LLMs to remove prompt injections from inputs? I feel like this isn't an intractable problem.

2 comments

te_chris 1158 days ago

The author addressed this: why would the model built on the hallucinating technique be able to police the main hallucinator

link

kolinko 1158 days ago

He didn't really.

link

TisButMe 1158 days ago

I now did in the parent comment :P

link

TisButMe 1158 days ago

(author here) How do you know what's a prompt injection vs actual content? If you train another LLM to tell you what's a prompt injection, how do you know it has 100% coverage of all possible injections? OpenAI has been battling people trying to bypass their prompt re-write filter, and as far as I can see, not really winning, just constantly adding stuff to their blocklist until the next thing gets discovered.

link