Hacker News new | ask | show | jobs
by tcdent 605 days ago
Literally everything is trivial to jailbreak.

The core concept is to pass information into the model using a cipher. One that is not too hard that it can't figure it out, but not too easy as to be detected.

And yes, o1 was jailbroken shortly after release: https://x.com/elder_plinius/status/1834381507978280989