|
|
|
|
|
by gabrielgoh
3270 days ago
|
|
I think an analogy can be made with Bayesian statistics. In principle, Bayesian statistics requires no training, just a way of sampling from the posterior, usually done with expensive MCMC methods. Here, we do not need training of any kind either, just a monte-carlo simulation of the environment and an approximation of which path has the greatest path entropy. Bsaically given a state, you do - Compute the path entropy for all states you can move to - Move into the state with greatest path entropy The tradeoff here is that all the work occurs in inference - every decision requires a complex simulation. In training based approaches the heavy lifting is done during training, and inference is easy |
|
It's interesting that this merit function works in the absence of a real reward signal, but there's no fair comparison against systems using a reward signal due to this huge alteration to the problem that is providing a perfect simulation.