| HN Mirror

Even if they're calling the same LLM, LLMs often get worse at doing things or forget some tasks if you give them multiple things to do at once. So if the goal is to detect a malicious input, they need that as the only real task outcome for that prompt, and then you need another call for whatever the actual prompt is for.

But also, I'm skeptical that asking an LLM is the best way (or even a good way) to do malicious input detection.