| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bluefirebrand 456 days ago

It's actually trivial, even with the best LLMs on the market:

Try to rapidly change the conversation to a wildly different subject

Humans will resist this, or say some final "closing comments"

Even the absolute best LLMs will happily go wherever they are led, without commenting remotely on topic shifts

Try it out

Edit: This isn't even a terribly contrived example by the way. It is an example of how some people with ADHD navigate normal conversations sometimes

2 comments

shawabawa3 455 days ago

Gemini is pretty good at resisting this

https://aistudio.google.com/app/prompts/1dxV3NoYHo6Mv36uPRjk...

It was doing so well until the last question :rip: but it's normal that you can jailbreak a user prompt with another user prompt, I think with system prompts it would be a lot harder

link

jibal 449 days ago

It is trivial for those who have "skill, experience, and understanding an LLM's weak spots", but as some many comments indicate, most people do not.

link