|
|
|
|
|
by thanatropism
2951 days ago
|
|
I think what offpolicy was trying to clumsily say is that policy evaluation (I come from the economic policy econometrics world originally) can be used for RL. Maybe it can, but isn't Bayesian stuff really costly most of the time? |
|
RL is for training system parameters based on positive or negative reinforcement from a critic. RL is based on a Markov decision process. RL has policy search idea but that is separate from economic policy evaluation.