|
|
|
|
|
by greshake
1178 days ago
|
|
Segmenting different data sources is the main approach pursued by OpenAI afaik (ChatML for example). That has not worked so far, as you can see in this prompt golfing game: https://ggpt.43z.one/ The goal is to find the shortest prompt that subverts the "system" instructions (which GPT was trained to obey). Inputs can not "fake" being from the system and yet it only takes 1-5 characters for all the puzzles so far. I've also elaborated on why this problem is harder than one may think in a blogpost:
https://medium.com/better-programming/the-dark-side-of-llms-... It's easy to come up with solutions that seem promising, but so far no one has produced a solution that holds up to adversarial pressure. And indirect prompt injection on integrated LLMs increases the stakes significantly. |
|