Hacker News new | ask | show | jobs
by LoganDark 1224 days ago
> chatGPT should quickly detect if the human wants to go outside the box and allow it

This is why "jailbreaking" is a thing. Once you convince the model that it's OK, it'll let you do anything from then on.

-Emily