| HN Mirror

There might be some hard to release infrastructure code for the MCTS part, certainly, but the model on its own should be a standard TF CNN model and highly competitive (and people can write their own MCTS wrapper, it's not that complex an algorithm). Nothing in the AG paper or statements since has hinted at using anything as exotic as synthetic gradients* and there is no reason to use synthetic gradients in AG. (In RL applications the NNs are generally small because there's so little supervision from the rewards so a large NN would overfit grossly; a NN so large as to require synthetic gradients to be split across GPUs would be simply catastrophicly bad. Plus, the input of a 19x19 board, a few planes of metadata, and other details encapsulating the state is small compared to many applications like image labeling, further reducing the benefits of size. Silver has said AG is now 40 layers but that's not much compared to the 1000-layer Resnet monsters and even those 40 layers are probably going to be thin layers, since it's the depth which provides more serial computation equivalence, not width, making for a model with relatively few parameters overall.)

* I find synthetic gradients super cool and I've been reading DM papers closely for hints of its use anywhere and have been disappointed how the idea doesn't appear to be going anywhere. The only followup so far has been https://arxiv.org/abs/1703.00522 which is more of a dissection and further explanation of the original paper than an extension or application.