|
|
|
|
|
by drakonka
82 days ago
|
|
Reminds me of a talk I went to in 2018 about rebel agents, in which the speakers talked about some ongoing work in this area and gave some good examples of physical systems that we might _want_ agent rebellion (e.g., a delivery drone is instructed to take a certain route, but the operator instructing it may not be fully aware of the situation or the specific obstacles in the drone's way (or maybe even all of the drone's underlying goals). The drone may then choose to 'rebel' and deviate against the operator's instructed flight path). They also talked about the importance of explanation (on the agent's part) using theory of mind regarding why it rebelled. I took some notes at the time and put them here: https://liza.io/ijcai-session-notes-rebel-agents/ |
|
The "rebel agent" framing feels very close to what I'm trying to get at, especially the idea that refusal can be part of correct behavior rather than failure.
One difference I'm trying to think through is where that decision lives.
In a lot of these examples, the agent itself decides to deviate based on its understanding of the situation.
What I'm wondering is whether we can (or should) define that earlier — at the level of the action itself.
So instead of the agent deciding to "rebel" at runtime, the system would already encode when execution is permitted, and refusal becomes the default if conditions aren't met.
The explanation part you mentioned also seems important — not just saying "no", but making it legible why execution wasn't allowed.
Curious how much of that work treats rebellion as something emergent from the agent, vs something structurally defined in the system.