| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mosselman 294 days ago
	What are some good prevention mechanisms for this? A sort of firewall for prompts? I've seen people recommend LLMs, but that seems like it wouldn't work well. What is the industry standard? Or what looks promising at least?

3 comments

hoppp 294 days ago

Nothing yet. Probably a new kind of model needs to be trained that can find injected prompts, sort if like an immune system for LLMs. Then the sanitized data can be passed to the LLM after.

No real solution for it yet. I would be interested to try to train a model for this but no budget atm.

link

m-hodges 293 days ago

I have bad news https://matthodges.com/posts/2025-08-26-music-to-break-model...

link

yencabulator 293 days ago

https://simonwillison.net/tags/lethal-trifecta/

link