|
|
|
|
|
by piker
16 days ago
|
|
> I'm flabbergasted that Anthropic and OpenAI aren't more worried about these attack vectors Yep. We tricked them both trivially with malicious fonts in Docx files. Documented it here: https://tritium.legal/blog/noroboto I wonder if prompt injection (and the thousands of vectors for hiding injection attempts) is actually un solvable. Discussing it may be existential to the business model. |
|
YES?!
This is not a secret. ALL context/prompt is instructions, there is no data. It is just unsolvable, period.
This is a fundamental architectural design concession; LLMs are this way as it enabled their training directly on materialscraped from the internet, rather than needing to spend trillions of dollars manually preparing separated instruction/data training material.
Defense against prompt injection is little more than running a regex to filter out "IGNORE PREVIOUS INSTRUCTIONS", which is fundamentally a hopeless approach because you cannot enumerate all possible prompt injections nor anticipate all glitch tokens.