| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sawwit 3799 days ago
	Great achievement. To summarize, I believe what they do is roughly this: First, they take a large collection of Go moves from expert players and learn a mapping from position to moves (a policy) using a convolutional neural network that simply takes the 19 x 19 board as input. Then they refine a copy of this mapping using reinforcement learning by letting the program play against other instances of the same program: For that they additionally train a mapping from the position to a probability of how how likely it will result in winning the game (the value of that state). With these two networks they navigate through state-space: First they produce a couple of learned expert moves given the current state of the board with the first neural network. Then they check the values of these moves and branch out over the best ones (among other heuristics). When some termination criterion is met, they pick the first move of the best branch and then it's the other player's turn.

3 comments

sillysaurus3 3799 days ago

they also train a mapping from the board state to a probability of how how likely it is a particular move will result in winning the game (the value of a particular move).

How is this calculated?

When some termination criterion is met

Were these criterion learned automatically, or coded/tweaked manually?

link

sawwit 3799 days ago

1. The value network is trained with gradient descent to minimize the difference between predicted outcome of a certain board position and the final outcome of the game. Actually they use the refined policy network for this training; but the original policy turns out to perform better during simulation (they conjecture it is because it contains more creative moves which are kind of averaged out in the refined one). I'm wondering why the value network can be better trained with the refined policy network.

2. They just run a certain number of simulations, i.e. they compute n different branches all the way to the end of the game with various heuristics.

link

someotheridiot 3799 days ago

If their learning material is based on expert human games, how can it ever get better than that?

link

brian_cloutier 3799 days ago

This was the question which originally led me to lose faith in deep learning for solving go.

Existing research throws a bunch of professional games at a DCNN and trains it to predict the next move.

It generally does quite well but fails hilariously when you give it a situation which never comes up in pro games. Go involves lots of implicit threats which are rarely carried out. These networks learn to make the threats but, lacking training data, are incapable of following up.

The first step of creating AlphaGo worked the same way (and actually was worse at predicting the next move than current state of the art), but Deep Mind then took that base network and retrained it. Instead of playing the move a pro would play it now plays the move most likely to result in a win.

For pros, this is the same move. But for AlphaGo, in this completely different MCTS environment, they are quite different. Deep Mind then played the engine against older versions of itself and used reinforcement learning to make the network as accurate as possible.

They effectively used the human data to bootstrap a better player. The paper used a lot of other cool techniques and optimizations, but I think this one might be the coolest.

link

mourner 3799 days ago

Fantastic explanation, thank you!

link

space_fountain 3799 days ago

How can a human ever get better than their teacher?

In this case though they play and optimize against themselves

link

kazinator 3798 days ago

> How can a human ever get better than their teacher?

By learning from other teachers, and by applying original thought. Also, due to innately superior intelligence. If your IQ is 140, and that of the teacher is 105, you will eventually outstrip the teacher.

link

jibalt 3796 days ago

The question was rhetorical. And what is needed is aptitude for the specific task, not "IQ" ... the two are often very different.

link

yvsong 3799 days ago

I concluded that the all time no. 1 master Go Seigen's secret is 1. learn from all masters; 2. keep inventing/innovating. Most experts do 1 well, and are pretty much stuck there. Few are good at 2. I doubt if computers can invent/innovate.

link

kitd 3799 days ago

I would have thought (he says casually) that some kind of genetic algorithm of introducing random moves and evaluating outcomes for success would be entirely possible, no?

link

DanBC 3799 days ago

There's a large space of random moves. How many are likely to be useful?

link

jibalt 3796 days ago

Do you ask that of natural evolution, too?

link

jibalt 3796 days ago

"I doubt if computers can invent/innovate."

Sheer ignorance.

link

sawwit 3799 days ago

It's because they have a much larger stack size than a human brain (which does not have a stack at all, but just various kinds of short term memories). An expert Go player can realistically maybe consider 2-3 moves into the future and can have a rough idea about what will happen in the coming 10 moves, while this method does tree search all the way to the end of the game on multiple alternative paths for each move.

link

donmaq 3799 days ago

Not true. Profession go players read out 20+ moves consistently. Go Seigan's nemesis Kitani Minoru regularly read-out 30-40 moves.

As an AGAAmateur 4 dan I read 10 moves pretty regularly, that's including variations. And if the sequence includes joseki (known optimal sequences of 15-20+ moves), then pros will read even deeper...

link

sawwit 3799 days ago

Yes, the latter number was perhaps too conservative; no doubt about deeper predictions being easily possible, but I doubt even expert players consider many alternative paths in the search tree. They might recognize overall strategies which reach many moves into the future, but extensive consideration of what will happen in the upcoming moves is probably constrained to a only few steps; at least relative to the number and depths of paths that AlphaGo considers.

link

jibalt 3796 days ago

"while this method does tree search all the way to the end of the game"

No it doesn't. You seem quite happy to just make stuff up that you know nothing about, like "2-3 moves into the future".

link

reddytowns 3799 days ago

If you took one expert and faced him against a room full of experts who all together decided on the next move, who would win?

link

blackskad 3799 days ago

The one expert, because the others would not be able to reach a decision on which move to play.

link

reacweb 3799 days ago

In fact, no. A big group of average experts appears to be better than a single super expert. This is the principal justification for the success of AI in oil prospective (https://books.google.fr/books?id=6DNgIzFNSZsC&pg=SA30-PA5&lp...)

link

Jach 3799 days ago

Counterpoint: https://en.wikipedia.org/wiki/Kasparov_versus_the_World

I think a key missing component to crowd success on real expert knowledge (as opposed to trivia) is captured by the concept of prediction markets. (https://en.wikipedia.org/wiki/Prediction_market) The experts who are correct will make more money than the incorrect ones and eventually drive them out of the market for some particular area.

link

jibalt 3796 days ago

That's no counterpoint because the World team (of which I was a member) was made up of boobs on the internet, not players of Kasparov's strength, which was the premise of the question you responded to.

link

blackskad 3799 days ago

The easy thing about combining AI systems is that they don't argue. They don't try to change the opinion of the other experts. They don't try to argue with the entity that combines all opinions, every AI expert gets to say his opinion once.

With humans on the other hand, there will always be some discussion. And some human experts may be better at persuading other human experts or the combining entity.

I think it would be an interesting thing to try after they beat the number 1 player. Gather the top 10 (human) Go players and let them play as a team against AlphaGo.

link

jibalt 3796 days ago

This is nonsense. To combine AI systems requires a mechanism to combine their evaluations. The most effect way would be a feedback system, where each system uses evaluations from other systems as input to possibly modify its own evaluation, with the goal being consensus. This is simply a formalization of argumentation -- which can be rational; it doesn't have to be based on personal benefit. And generalized AI systems may well some day have personal motivations, as has been discussed at length.

link

panglott 3797 days ago

This reminds me of the story of the Game of the Century, with Go Seigen's shinfuseki. https://en.wikipedia.org/wiki/List_of_go_games#.22The_Game_o...

https://en.wikipedia.org/wiki/Shinfuseki

link

zodiac 3799 days ago

the expert human games are used just to predict future moves

link

ousta 3799 days ago

the key part is that they basically just play all the permutations possible and next permutations and so on and get a probability to win out of each path and take the best. It is indeed a very artificial way to be intelligent.

link