Hacker News new | ask | show | jobs
by Vetch 1385 days ago
I do not believe this is still true in MuZero, where no rules are ever explicitly encoded, beyond what leaks in from the reward function?
1 comments

Indeed, in this latest iteration the network learns a model of the game by itself. Note however that it still uses MCTS and performs a look-ahead, i.e. it is still being "told" by the programmers how to do search (planning), only now it is left to the system to determine how it wants to represent states/actions/policies internally.