| HN Mirror

Indeed, in this latest iteration the network learns a model of the game by itself. Note however that it still uses MCTS and performs a look-ahead, i.e. it is still being "told" by the programmers how to do search (planning), only now it is left to the system to determine how it wants to represent states/actions/policies internally.