Hacker News new | ask | show | jobs
by anywhichway 314 days ago
> sometimes called prompt canarying or decoy system prompts.

Both "prompt canarying" and "decoy system prompts" give 0 hits on google. Those aren't real things.

2 comments

Those talk about a mechanism to detect prompt injection. If that had been true, we should have seen the chatbot refuse, not lie.
Maybe it was trained on some internal documentation. ;)