|
|
|
|
|
by cantsingh
538 days ago
|
|
I've been playing with the same thing, it's like a weird mix of social engineering and SQL injection. You can slowly but surely shift the window of what the bot thinks is "normal" for the conversation. Some platforms let you rewrite your last message, which gives you multiple "attempts" at getting the prompt correct to keep the conversation going the direction you want it. Very fun to do on that friend.com website, as well. |
|
So in these cases where you think you’ve jailbroken an LLM, is it really jailbroken or is it just playing around with you, and how do you know for sure?