Hacker News new | ask | show | jobs
by cweill 1265 days ago
I also have this question. Is the RL MDP actually encoding cause and effect? Or just learning (bidirectional) correlations between states and actions?

I wonder if Pearl thinks that RL replicates his do-calculus under the hood, or if that's an innovation we're missing.