Hacker News new | ask | show | jobs
by dsacco 3130 days ago
While I understand your cynicism in the practical applicability of a chess or go-playing AI, I think you are significantly underestimating the theoretical innovations contributed to the field every time these models are substantially improved. Much of the work that goes into improving something like AlphaGo is cross-applicable and cross-pollinated to other research projects, and gradually trickles out into other domains with much more real-world impact.
3 comments

The basic problem with AlphaGo Zero is that the state of a Go game is fully deterministic, fully Markovian, and fully amenable to quick simulation. The player makes a move, and the simulator computes the next game-state in milliseconds from only the current game-state. This is what lets the AlphaGo Zero agent train so quickly on self-play.

If you start requiring high-dimensional empirical data where the generating dynamics aren't Markovian (or aren't neatly predictable with a Markovian simulator, even if God considers them fully determined), you start having to do stuff like full-blown physics simulations while also specifying agent goals in terms of those physical states. Then you've got the machine learning part and the simulation part taking up comparable amounts of compute power, and self-supervised training becomes much more difficult.

I agree that partial observation and imperfect information present computational difficulties to generalization. Do you know of any interesting research offhand for reading about optimizations for this problem?
> I think you are significantly underestimating the theoretical innovations contributed to the field every time these models are substantially improved.

I think you are overestimating, there isn't a single interesting theoretical insight in AlphaGo's papers.

Can you define what you mean by “theoretical insight”? It’s true that AlphaGo was built using previously existing techniques (supervised learning, large dataset for training, reinforcement learning and monte carlo tree search). But if you consider something to not be a breakthrough because it does not literally introduce a novel fundamental technique, you have a very narrow view of research (in my opinion).

Here are a few points to consider:

1. The combination of the aforementioned techniques in AlphaGo was non-standard. Reinforcement learning bootstrapped supervised learning, before passing a value function to the monte carlo tree search.

2. AlphaGo represents a new achievement in solving perfect information games. The research team has moved on to Starcraft, which is not perfect information, but they didn’t try to tackle that before conquering a complex perfect knowledge game first.

3. AlphaGo’s research team improved upon the original AlphaGo with a novel algorithm for self-learning and mastering games using minimal policy improvement. The new AlphaGo Zero does not utilize human training data or supervised learning, and it was capable of defeating the original AlphaGo 100-0.

Beyond self-play, I think that AlphaGo’s methodologies can generalize to combinatorial search problems even if they don’t generalize to broader domains like partially observed games or robotics.

I think folding the update rule inside the MCTS loop (in alphaGO Zero) is genius.
That is a big claim. Where's the detailed analysis and citations?
Playing devil's advocate. I casually agree that AlphaGo Zero was valuable, but if we were to put the onus on you...

What theoretical innovations did AlphaGo Zero provide?

I gave a brief overview of that in a parallel comment on this thread :)