|
|
|
|
|
by leemack
138 days ago
|
|
Oh my bad just noticed that’s what you conveyed an idea for right below
Use of consistent italics would’ve been great But why is it that you think you can predict if a tool name is required by user prompts and a few tool responses ? I can also reply to the bot as simply “ok” for a plan already rolled out by the bot |
|
Can we not assume that the plan you just said “ok” to came from a user prompts you made earlier in the chat session and hence does influence this decision process.
Another point in the idea is that this trusted context can include even the AI replies up until there hasnt been a tool calls yet that brings back a response an attacker can control
But it’s entirely possible that there are edge cases here, a red teaming dataset to cover these cases shouldn’t be hard to create