|
|
|
|
|
by eli_gottlieb
3129 days ago
|
|
The basic problem with AlphaGo Zero is that the state of a Go game is fully deterministic, fully Markovian, and fully amenable to quick simulation. The player makes a move, and the simulator computes the next game-state in milliseconds from only the current game-state. This is what lets the AlphaGo Zero agent train so quickly on self-play. If you start requiring high-dimensional empirical data where the generating dynamics aren't Markovian (or aren't neatly predictable with a Markovian simulator, even if God considers them fully determined), you start having to do stuff like full-blown physics simulations while also specifying agent goals in terms of those physical states. Then you've got the machine learning part and the simulation part taking up comparable amounts of compute power, and self-supervised training becomes much more difficult. |
|