|
|
|
|
|
by hulium
287 days ago
|
|
During play, yes, obviously you need an implementation of the game to play it. But in its planning tree, no: > MuZero only masks legal actions at the root of the search tree where the environment can be queried, but does not perform any masking within the search tree. This is possible because the network rapidly learns not to predict actions that never occur in the trajectories
it is trained on. https://arxiv.org/pdf/1911.08265 |
|