Hacker News new | ask | show | jobs
by robhlt 301 days ago
The flaw isn't that there's ways around the safeguards, the flaw is that it tells you how to avoid them.

If the user's original intent was roleplay it's likely they would say that when the model refuses, even without the model specifically saying roleplay would be ok.