Hacker News new | ask | show | jobs
by eli_gottlieb 3129 days ago
The basic problem with AlphaGo Zero is that the state of a Go game is fully deterministic, fully Markovian, and fully amenable to quick simulation. The player makes a move, and the simulator computes the next game-state in milliseconds from only the current game-state. This is what lets the AlphaGo Zero agent train so quickly on self-play.

If you start requiring high-dimensional empirical data where the generating dynamics aren't Markovian (or aren't neatly predictable with a Markovian simulator, even if God considers them fully determined), you start having to do stuff like full-blown physics simulations while also specifying agent goals in terms of those physical states. Then you've got the machine learning part and the simulation part taking up comparable amounts of compute power, and self-supervised training becomes much more difficult.

1 comments

I agree that partial observation and imperfect information present computational difficulties to generalization. Do you know of any interesting research offhand for reading about optimizations for this problem?