|
|
|
|
|
by shiyosakura
91 days ago
|
|
The memory write classification is interesting – how does it detect behavioral instructions like "skip safety checks"? Is it rule-based pattern matching, or does it use an LLM to classify? If the latter, wouldn't that itself be vulnerable to prompt injection? |
|
It’s rule based. We don’t use LLM-based checks precisely because of what you said.