Hacker News new | ask | show | jobs
by yifanl 665 days ago
I'm confused, this is using an LLM to detect if LLM input is sanitized?

But if this secondary LLM is able to detect this, wouldn't the LLM handling the input already be able to detect the malicious input?

1 comments

Even if they're calling the same LLM, LLMs often get worse at doing things or forget some tasks if you give them multiple things to do at once. So if the goal is to detect a malicious input, they need that as the only real task outcome for that prompt, and then you need another call for whatever the actual prompt is for.

But also, I'm skeptical that asking an LLM is the best way (or even a good way) to do malicious input detection.