|
|
|
|
|
by HarHarVeryFunny
471 days ago
|
|
First there was AlphaGo, which had learnt from human games, then further improved from self-play, then there was AlphaGo Zero which taught itself from scratch just by self-play, not using any human data at all. Game programs like AlphaGo and AlphaZero (chess) are all brute force at core - using MCTS (Monte Carlo Tree Search) to project all potential branching game continuations many moves ahead. Where the intelligence/heuristics comes to play is in pruning away unpromising branches from this expanding tree to keep the search space under control; this is done by using a board evaluation function to assess the strength of a given considered board position and assess if it is worth continuing to evaluate that potential line of play. In DeepBlue (old IBM "chess computer" that beat Kasparov) the board evalation function was hand written using human chess expertise. In modern neural-net based engines such as AlphaGo and AlphaZero, the board evaluation function is learnt - either from human games and/or from self-play, learning what positions lead to winning outcomes. So, not just brute force, but that (MCTS) is still the core of the algorithm. |
|