| I've been thinking about AI systems acting in the physical world. Most discussions about control focus on what the system should do, and how to make execution reliable. But it seems like a lot of real-world failures aren't about incorrect execution. They're about execution happening at all. An action can be technically correct — executed exactly as specified — and still be the wrong thing to do because the context has changed. This made me wonder if control should be framed differently. Instead of focusing on defining actions, maybe we should focus on defining when actions are allowed to happen. In other words, control might be less about execution and more about permission. If conditions aren't satisfied, the system shouldn't try and fail — it simply shouldn't execute. I'm curious if people have seen similar issues in real-world systems, or if this framing connects to existing work. |
They also talked about the importance of explanation (on the agent's part) using theory of mind regarding why it rebelled. I took some notes at the time and put them here: https://liza.io/ijcai-session-notes-rebel-agents/