|
|
|
|
|
by Radim
3259 days ago
|
|
On the topic of "likely exhausts memory": The best search methods allow reversible state updates. The reversibility makes things super-fast -- you no longer need to copy all data structures on path expansion. Instead, you modify only a single representation, incrementally. And when retracting a search step, you "undo" the same modifications again, arriving at the exact same state as when you started. This is of course non-trivial -- it is much easier to copy everything, then throw away the entire copy when it's not needed, rather than keeping a single state incrementally consistent. But the effects due to data locality (excellent caching), better memory management (no allocations, fragmentation) and less work (only touch and update parts of the state that matter) can be tremendous. I haven't seen any discussion of DeepMind's implementation details for AlphaGo, but since they come from a game development background (David Silver was the CTO and lead dev at Elixir Studios), where each cycle counts, I have no doubt they're well familiar with all these concepts. But then the TPU throws a wrench into it again... |
|
MCTS's playouts do not need to backtrack (they are just the greedy probes), so it is irrelevant. By backtrack, do not confuse it with the backpropagaton in MCTS.
I do not see the connection to TPU.