Hacker News new | ask | show | jobs
by gracenotes 3754 days ago
I took a closer read through the AlphaGo paper today. There are some other features that make it not general.

In particular, the initial input to the neural networks is a 19×19×48 grid, and the layers of this grid include information like:

- How many turns since a move was played

- Number of liberties (empty adjacent points)

- How many opponent stones would be captured

- How many of own stones would be captured

- Number of liberties after this move is played

- Whether a move at this point is a successful ladder capture

- Whether a move at this point is a successful ladder escape

- Whether a move is legal and does not fill its own eyes

Again, before the neural nets even get involved. Some of these layers are repeated 8 times for symmetry. I would say for some of these, AlphaGo got some domain-specific help in a non-general way.

It is of course still groundbreaking academically. The architecture is a state-of-the-art deep learning setup and we learned a ton about how Go and games in general work. The interaction between supervised and reinforcement learning was interesting, especially how the latter behaved worse in practice in selecting most likely moves.

disclaim: Googler, not in anything AI.

3 comments

Note, that these features are for the RollOut fast policy. The reason is that this needs to be fast, so rather than a net they have a linear policy. A linear policy in order to work it requires good feature selection, which is what this is. In some future, when we have better hardware, you can imagine removing the roll out policy and having just one.
I think you're confusing this list of attributes with a separate list used for rollouts and tree search. The ones above are definitely used for the neutral networks in the policy and value networks. See: "Extended Data Table 2: Input features for neural networks. Feature planes used by the policy network (all but last feature) and value network (all features)."
I'm having trouble identifying what algorithmic innovation AlphaGo represents. It looks like a case of more_data + more_hardware. Some are making a big deal of the separate evaluation and policy networks. So, OK, you have an ensemble classifier.

The most theoretically interesting thing to me is the use of stochastic sampling to reduce the search space. Is there any discussion of how well pure Monte Carlo tree search performs here compared to the system incorporating extensive domain knowledge?

Wow, this really took me by surprise. I thought the only input was (s_1...s_final, whowon) where s are statates during training and (s_current) during play, and the system would learn the game on its own. That's the way it worked with the Atari games anyway.
I expect the Atari games, if we're thinking of the same articles, had much less strategic depth than playing a Go champion.