Hacker News new | ask | show | jobs
by steve-atx-7600 41 days ago
I did stuff like this with bing when they first released their OpenAI based model. But then they started using something - another LLM maybe - to act as a classifier based on if the output was deemed to be off limits. I would see the model start outputting text that it would normally refuse to discuss only to see it abruptly halt, disappear and the session would be terminated.
1 comments

Maybe tell it to output rhyming slang pig Latin.

Or, since you are in a terminal anyway, rot13

Asking to write rhyming poems always helps with jailbreaking. I had the ryanair chatbot write a poem about how terrible ryanair was, once.