Hacker News new | ask | show | jobs
by deepnet 3882 days ago
This Facebook Video suggests it has had months of either training or development, which sounds might be it learning by playing, reinforcement learning in this space could be very interesting - more details would be good.

The two breakthrough papers using Deep Conv Nets to play Go - trained on Expert Games to predict the Experts Next Move.

This is made possible by the huge archives of expert play from online Go servers that have reached Big Data sizes in the last few years.

Teaching Deep Convolutional Neural Networks to Play Go - Clark & Storkey Edinburgh Informatics 2015 http://arxiv.org/abs/1412.3409

Move Evaluation in Go Using Deep Convolutional Neural Networks Maddison, Huang, Silver, Sutskever Deepmind 2015 http://arxiv.org/abs/1412.6564

David Silver has work on Reinforcement Learning and Go: Reinforcement Learning of Local Shape in the Game of Go Silver et al 2007. https://scholar.google.co.uk/citations?view_op=view_citation...

Go is scored by occupying and surrounding territory and the opponents pieces taken.

The current State of the Art programs use Monte Carlo Tree Search with each move being evaluated using random playouts - this is well suited to Go as it reasnoble estimate for comparison of the territory that will result from a move.

As the Go search tree branches a lot (~200 branches per play) tree search takes some time and cannot be exhaustive.

The Expert Move Predicting Convnets provide very good moves very quickly, and can provide and provide probabilities the expert would move for each square in a single forward pass.

Most of the current development now centers around using the Convnets to prune the search trees for the random playout MCTS engines.

Go is useful from an AI perspective because it is a huge state space and the patterns are subtle with long term implications. No computer can yet beat the best humans - so it is an area where natural intelligence beats artificial.

Yet the convnets are not doing planning, only predicting what an expert human would do next, the planning is implicit in the dataset but not a part of the convnet itself (though the internals of deepnets are mysterious so who knows).

The Computer Go mailing list is very lively and full of the best program creators with regular computer tournaments against each other. http://computer-go.org/mailman/listinfo/computer-go

The Facebook video shows GnuGo is the opposing player, this is not the best computer player and suggests Facebooks player is not as strong as the Convnets as yet - it also does not detail how deep the playouts of the tree search are, i.e. what level GnuGo is playing at, so this cannot really be compared to recent work.

Hopefully Facebook will publish more details soon - if they are learning through play then their results may well be interesting.

Remi Coulom's excellent slides detail the Monte Carlo revolution in Go programs. http://www.remi-coulom.fr/JFFoS/JFFoS.pdf

AFAIK MCTS with Random playout and hand coded heuristics are still SOTA.

When I visited in the summer Edinburgh Uni Informatics Department convnet Go was in active development.