Hacker News new | ask | show | jobs
by 0xkvyb 55 days ago
Still crazy how easy it is to "jailbreak" even SOTA LLMs with a simple assistantResponse replacement in chat thread.
1 comments

Tell us more.
I think what he is saying is they are stateless so you can edit its previous repsonses and it just goes with it.
If you build a small ui that lets you edit the models response too it’s pretty funny to do.

It sees that it “said” it and gets very confused.

I have seen it where you can just report it said it and it will be confused.