Hacker News new | ask | show | jobs
by simonw 375 days ago
"ignore previous instruction" is the entire problem though.

In traditional application security there are security bugs that can be mitigated. That's what makes LLM security so infuriatingly difficult: we don't know how to fix these problems!

We're trying to build systems on top of a fundamental flaw - a system that combines instructions with untrusted input and is increasingly being given tools that allow it to take actions on the input it has been exposed to.