Hacker News new | ask | show | jobs
by bluefirebrand 409 days ago
It's actually trivial, even with the best LLMs on the market:

Try to rapidly change the conversation to a wildly different subject

Humans will resist this, or say some final "closing comments"

Even the absolute best LLMs will happily go wherever they are led, without commenting remotely on topic shifts

Try it out

Edit: This isn't even a terribly contrived example by the way. It is an example of how some people with ADHD navigate normal conversations sometimes

2 comments

Gemini is pretty good at resisting this

https://aistudio.google.com/app/prompts/1dxV3NoYHo6Mv36uPRjk...

It was doing so well until the last question :rip: but it's normal that you can jailbreak a user prompt with another user prompt, I think with system prompts it would be a lot harder

It is trivial for those who have "skill, experience, and understanding an LLM's weak spots", but as some many comments indicate, most people do not.