Hacker News new | ask | show | jobs
by WilliamDhalgren 3315 days ago
> AlphaGo essentially baked good movies into value and policy network by playing millions of times.

I don't think that's a very good description of how AlphaGo was trained at all; you're essentially saying it merely overfits the training set, yet it clearly generalizes rather well to unseen board situations and still evaluates them sucessfully. No machine learning system would be found usefull if all it could do is merely memorize the training data.

Re the use of deep reinforcement learning, well for one the role of reinforcement learning in the first version of AlphaGo, the one described in the Nature paper was rather limited, and a small part of its training; it just made a ~3d KGS policy network into a ~5d KGS bot, and used to generate a training sample for the value net. If we had enough recorded human games to train the value net directly, that'd be an unnecessary step anyhow. And you could create such a training set w/o reinforcement learning since there are pure monte carlo bots stronger than 5d KGS - but that'd be far more computationally expensive.

But its still not really true that there aren't obvious applications of deep reinforcement learning - indeed robotics is one promising application, and that seems rather relevant. this paper initially demonstrated an impressive improvement in manipulative tasks, and you can prob follow its numerous citations for newer stuff: http://arxiv.org/abs/1504.00702

I do agree that this exact architecture in AlphaGo prob doesn't have applications beyond just teaching us how to play go better; it seems too specialized. I believe they mean it in just the vaguest possible sense; that the kind of deep algorithms demonstrating incredible performance in AlphaGo have diverse applications; but this should not come as a surprise to anyone even loosely following what people have done with deep learning in the past couple of years anyhow.

2 comments

Go works precisely because it is a small closed system. An interesting match (from an AI perspective) would be a pro playing alphago on an unusual board (eg, one in the shape of a cat). The pro would take everything he knows about the game and apply it to the odd situation. Alphago is so specifically tuned that it cannot even handle any case except 19x19 (and maybe 9x9). Another interesting question would be small rules changes like "you may not play on any star points or any point directly touching them until turn 30".

Go has deep strategy, but it is very well defined in terms of what can and cannot be done and those rules are not particularly complex. Power grids in contrast are far more complex. There are thousands of rules, but also many more thousands of unwritten assumptions and case-by-case analysis. A final issue is that there exist unsolved and unrecognized problems.

The last AI winter (deep learning is just the latest rebrand) came from researchers overstating their accomplishments and making promises about general intelligence that could not be kept. Any claim about anything that requires general intelligence in the near future is undoubtedly overpromising.

> Alphago is so specifically tuned that it cannot even handle any case except 19x19 (and maybe 9x9).

Do you have any sources to back this assertion? It sounds unintuitive as I know object recognition sytema are usually trained on small images but they generalize well to arbitrary image sizes. What you are describing sounds like overfitting.

The paper itself repeatedly says that all 48 layers of the policy network are 19x19 matrices. To make the point though, they initially train alphago using actual games. After a hundred thousand or so training games, it's finally ready to start playing and learning. There are less than a couple dozen recorded games on larger boards.

If you haven't played go very much, you may thing that "it's just a bigger board". 19x19 is commonly used because it has an even balance of edge and center influence (in reality, edge influence seems to be slightly higher). With the 13x13, corner plays have overwhelming influence in the center. At 9x9, there is basically no center strategy at all. Normal strategies starting in the corners and expanding influence toward the center don't work as effectively with larger boards (the larger the board, the more this becomes true).

This is a much different issue than image recognition in that strategy doesn't scale in the same way that images do.

I'm a huge proponent of alpha Go and I think it is a revolutionary leap.

The key I think is,

> yet it clearly generalizes rather well to unseen board situations and still evaluates them sucessfully

I'm not sure this has been proven to be meaningful in a general sense, as you seem to also imply. Extrapolation can be a tricky answer subtle business. What about unusual board sizes, for which no training data exists? Or if you changed a rule? I'm sure deepmind would say the adversarial approach would work for these cases, but I'm not sure it would. Would be very interesting to see if humans could 'learn' a new state more quickly than the algorithm.

That might provide a hint that the algorithm is 'just' fitting the data well (with appropriate baked in regularization, of course). Or if it can more generally 'learn' given system rules.

Hm, well you are no doubt right that it doesn't generalize well to a change of rules. Reminds me of that game DeepZen played. It was trained with a komi of 7.5 and it played too soft and lost when in the actual match komi was 6.5 (or maybe it was the other way around?). A human does not have much trouble adapting to such small rules variations, but at least the version of DeepZen that played that match was hard-coded for that exact komi value, because that's what what used in all of its training examples, and wasn't given as a parameter. It shouldn't be a hard limit of the approach - indeed I think AyaMC was said to have been trained with some flexibility in its komi.

Still, I think AlphaGo does demonstrate amazing positional judgement in unseen board states, and that this is visible in the details of how it plays out particular situations. No two games are exactly alike - difficulty of go for computers is precisely in its extreme combinatorial explosion - and in particular tactical situations every detail of the situation matters. Yet you can see AlphaGo judging the correct sequences of moves, "knowing" how to make a particular group alive for eg, even when a particular other move seems more natural. And probably the most amazing thing about how it plays is how early it becomes completely sure that its got an advantage on the board, and how precisely it judges how much it needs to keep the advantage to the end. Every detail of the board is again relevant here, and basically no human would be so confident so soon. A go bot that couldn't adapt its tactics to unseen situation would be easy to beat; just ensnare it in a large complicated fight, and you're going to kill a big group and guarantee a win. Ofc people tried this in some masterP games, and turns out AlphaGo is tactically just as strong.

So, its basically like with other generalizations you can get from machine learning; a net trained on say ImageNet will generalize to different poses, occlusions, contexts and variations of objects similar to what it was exposed in training etc and still do a superhuman job of classifying such pictures, but will naturally be quite hopeless with completely unseen items. So too AlphaGo seems to know the game of go, generalizing from seen examples to correct judgements in other states, but would be quite hopeless if tested on even a slight variation of the game rules.