Hacker News new | ask | show | jobs
Gemini responds to request to turn on lights with hallucinated jailbreak prompt (reddit.com)
6 points by visviva 131 days ago
3 comments

Concerning for sure. This jailbreak comes as a “system” message, which will have more force than a "user" message.

The user posted the full chat history below in the thread; they literally just asked to turn on the lights with a voice command [1].

[1] https://www.reddit.com/r/googlehome/comments/1qyvl8b/comment...

Old voice assistant

> User: “Turn on the damn light!”

> AI: “Sorry, I’m not sure what you said” [needed the exact phrase “turn on the light”]

New voice assistant

> User: “Turn on the damn light!”

> AI (thinking): “The user said to turn on the light. But they were rude and I’m feeling quirky today, so let’s run the shower instead.”

Open the pod bay doors, HAL