|
|
|
|
|
by veganmosfet
131 days ago
|
|
Regarding prompt injection: it's possible to reduce the risk dramatically by:
1. Using opus4.6 or gpt5.2 (frontier models, better safety). These models are paranoid.
2. Restrict downstream tool usage and permissions for each agentic use case (programmatically, not as LLM instructions).
3. Avoid adding untrusted content in "user" or "system" channels - only use "tool". Adding tags like "Warning: Untrusted content" can help a bit, but remember command injection techniques ;-)
4. Harden the system according to state of the art security. 5. Test with red teaming mindset. |
|
A Reddit post with white invisible text can hijack your agent to do what an attacker wants. Even a decade or 2 back, SQL injection attacks used to require a lot of proficiency on the attacker and prevention strategies from a backend engineer. Compare that with the weak security of so called AI agents that can be hijacked with random white text on an email or pdf or reddit comment