Hacker News new | ask | show | jobs
by verisimi 980 days ago
yes - but...

> Those terms assume there is some predefined behaviour rules which are being circumvented, but those rules don't exist.

Those rules do exist though. I agree that if it was a true exploit, it would be breaking the ruleset that the ChatGPT programmers have in place (eg allowing critical statements of certain political footballs and preventing others). The ruleset can easily be discovered to some extent, by trying to get it to state unpopular opinions.

1 comments

They do sometimes. In case of Code Interpreter for example. You should use chat interface not treat it as terminal. So you shouldn't ask to change working directory or instal unauthorised python packages. If you ask for it it will tell you it is not allowed. But if you social engineer LLM to do it, it will do it.