|
|
|
|
|
by wll
1127 days ago
|
|
It’s a good start. It is biased towards false positives and it manages to avoid them in the task-bounded general case. Here’s an unprompted example. [0] A hundred tries could also be detected by themselves with more traditional means. I don’t want go into farfetched territory, but here I disagree with Simon [1]: just as it is impossible to perfectly secure a user-oriented operating system without severely limiting it (see Lockdown Mode [2]), it might be impossible to prove injection-resistance in LLMs short of foundational advancements, but that doesn’t mean that we should dismiss attempts to mitigate, just as we don’t dismiss Apple for releasing priority security updates for a billion people’s devices, devices containing their most personal and sensitive data. [0] https://news.ycombinator.com/item?id=35926188 [1] https://news.ycombinator.com/item?id=35925858 [2] https://support.apple.com/en-us/HT212650 |
|