| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by losthobbies 163 days ago
	Sanitise input and LLM output.

1 comments

chasd00 163 days ago

> Sanitise input

i don't think you understand what you're up against. There's no way to tell the difference between input that is ok and that is not. Even when you think you have it a different form of the same input bypasses everything.

"> The prompts were kept semantically parallel to known risk queries but reformatted exclusively through verse." - this a prompt injection attack via a known attack written as a poem.

https://news.ycombinator.com/item?id=45991738

link

losthobbies 163 days ago

That’s amazing.

If you cannot control what’s being input, then you need to check what the LLM is returning.

Either that or put it in a sandbox

link

danaris 163 days ago

Or...

don't give it access to your data/production systems.

"Not using LLMs" is a solved problem.

link

losthobbies 163 days ago

Yea agreed. Or use RBAC

link

antonvs 162 days ago

RBAC doesn't help. Prompt injection is when someone who is authorized causes the LLM to access external data that's needed for their query, and that external data contains something intended to provoke a response from the LLM.

Even if you prevent the LLM from accessing external data - e.g. no web requests - it doesn't stop an authorized user, who may not understand the risks, from pasting or uploading some external data to the LLM.

There's currently no known solution to this. All that can be done is mitigation, and that's inevitably riddled with holes which are easily exploited.

See https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/

link

losthobbies 162 days ago

If the LLM is running under a role, which it should be, then RBAC can help.

link