Hacker News new | ask | show | jobs
by arw0n 263 days ago
The biggest and most difficult to mitigate attack vector is indirect prompt injection.[0] So far most case studies have been injecting malicious prompts at inference, but there is good reason to believe you can do this effectively at different stages of training as well.[1] By layering obfuscation techniques, these become very hard to detect.

[0] https://arxiv.org/abs/2302.12173

[1] https://arxiv.org/html/2410.14827v3