| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by leemack 138 days ago
	The second layer you write about here, doesn’t it kind of depend on huma judgement and hence produce the same problem as having human in the loop for all tools, and hence isn’t really a layer due to decision fatigue?

1 comments

leemack 138 days ago

Oh my bad just noticed that’s what you conveyed an idea for right below Use of consistent italics would’ve been great

But why is it that you think you can predict if a tool name is required by user prompts and a few tool responses ?

I can also reply to the bot as simply “ok” for a plan already rolled out by the bot

ashwinr2002 138 days ago

Apologies for the formatting

Can we not assume that the plan you just said “ok” to came from a user prompts you made earlier in the chat session and hence does influence this decision process.

Another point in the idea is that this trusted context can include even the AI replies up until there hasnt been a tool calls yet that brings back a response an attacker can control

But it’s entirely possible that there are edge cases here, a red teaming dataset to cover these cases shouldn’t be hard to create