Hacker News new | ask | show | jobs
by Yoric 538 days ago
> So in these cases where you think you’ve jailbroken an LLM, is it really jailbroken or is it just playing around with you, and how do you know for sure?

With a LLM, I don't think that there is a difference.

1 comments

I like to think of it as a amazing document autocomplete being applied to a movie script, which we take turns appending to.

There is only a generator doing generator things, everything else--including the characters that appear in the story--are mostly in the eye of the beholder. If you insult the computer, it doesn't decide it hates you, it simply decides that a character saying mean things back to you would be most fitting for the next line of the document.