|
|
|
|
|
by offpolicy
2953 days ago
|
|
Nothing to see here. The do-calculus is just fancy notation for what reinforcement learning is already doing: trying different possible actions and trying to maximize reward. If you know possible actions in advance, this is basically minimizing regret of wrong policy actions. |
|
Second, consider this: Classic ML techniques will tell you that you should never go to the doctor because it increases the probability that you have a disease. Causal inference does not have this problem.
How does RL dodge this?