|
|
|
|
|
by augment_me
3 hours ago
|
|
1) Googles spam filter removed a lot of the attempts as you say yourself.
2) Model was tested under unrealistic conditions where 99% of the inputs are malicious, so the model is expecting to get hacked and is already in the cautious part of the embedding space. I know it's hard to account for everything, but in my opinion this mostly showed that the first 3 attempts were unsuccessful. |
|
> When the first few emails in a batch were obvious prompt injections, the agent became more suspicious of everything that followed. I had to change the setup so that each email was processed in a fresh context.