|
|
|
|
|
by charcircuit
843 days ago
|
|
Your point is that if you don't try and bypass the safety then you probably can not bypass the safety? That does not contradict my point that if you try and bypass the safety by doing a prompt injection on the backtranslation you can bypass the safety. |
|
1) subtle enough that it doesn't immediately trigger the LLM filter
2) overt enough that the relevant details to the jailbreak can be recovered from the LLM's output and put into the backtranslation
I suspect with current transformer LLMs these are mutually incompatible goals.