Hacker News new | ask | show | jobs
by losthobbies 163 days ago
Sanitise input and LLM output.
1 comments

> Sanitise input

i don't think you understand what you're up against. There's no way to tell the difference between input that is ok and that is not. Even when you think you have it a different form of the same input bypasses everything.

"> The prompts were kept semantically parallel to known risk queries but reformatted exclusively through verse." - this a prompt injection attack via a known attack written as a poem.

https://news.ycombinator.com/item?id=45991738

That’s amazing.

If you cannot control what’s being input, then you need to check what the LLM is returning.

Either that or put it in a sandbox

Or...

don't give it access to your data/production systems.

"Not using LLMs" is a solved problem.

Yea agreed. Or use RBAC
RBAC doesn't help. Prompt injection is when someone who is authorized causes the LLM to access external data that's needed for their query, and that external data contains something intended to provoke a response from the LLM.

Even if you prevent the LLM from accessing external data - e.g. no web requests - it doesn't stop an authorized user, who may not understand the risks, from pasting or uploading some external data to the LLM.

There's currently no known solution to this. All that can be done is mitigation, and that's inevitably riddled with holes which are easily exploited.

See https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/

If the LLM is running under a role, which it should be, then RBAC can help.