|
|
|
|
|
by manquer
9 days ago
|
|
How will flagging help? The main llm will refuse to scan for issues flagged or not, and the cheap model not do a good enough scan on its own. For models designed/marketed for cybersecurity defensive uses, any predictable refusal mechanism is a vulnerability. It is like being able to cause a kernel panic or segmentation fault . Even if the gate is fail-reject, an attacker can overwhelm HITL reviews with many false positives and use DoS vectors here. |
|