Hacker News new | ask | show | jobs
by deepnet 3256 days ago
> particularly in programs like AlphaGo, which use an ‘internal model’ to analyse how actions lead to future outcomes in order to to reason and plan.

I was under the impression that AlphaGo makes no plan but responds to the current board state with expert move probabilites that prunes MCTS random playouts.

There is no plan (AFAIK) or strategy in the AlphaGo papers so I find this statement that AlphaGo is an imaginative planner quite curious.

Perhaps someone can reconcile these statements or correct my knowledge of AlphaGo ?

Very interesting papers, it will be nice to see the imagination encoder methods applied to highly stochastic enviroments or indeed a robot in the real world.

1 comments

In AlphaGo, MCTS is used to explore many plans and select the best. As far as I know, it then execute only the first action of the selected plan, and start a new planning for the next action. As such, it doesn't "stick to the plan", so you could say that it doesn't have a strategy. But the MCTS is definitely a planner.
Yes absolutely, I think your explication is perfectly correct.

Though (IMHO) MCTS is better characterised as evaluating moves rather than exploring plans.

The MCTS only explores the moves in order of likelyhood using the most basic of heuristics, random playout.

The Net outputs likely moves based only the current board position, it formulates no strategy.

No state is stored across moves - each play is independent, relying only on the current board position.

I still don't see anything anywhere in AlphaGo that is a plan, trajectory or strategy.

Neither is there an evaluation of the opponent nor any attempt to outwit them.

That it performs so astonishingly well without a plan is very very interesting and should perhaps give us pause - is planning a hubris ? Do we undervalue our use of heuristics in our own behaviour ?