|
|
|
|
|
by cweill
1265 days ago
|
|
I also have this question. Is the RL MDP actually encoding cause and effect? Or just learning (bidirectional) correlations between states and actions? I wonder if Pearl thinks that RL replicates his do-calculus under the hood, or if that's an innovation we're missing. |
|