|
|
|
|
|
by VHRanger
3283 days ago
|
|
They've been doing similar things in Chess for a long time. For the record, the strategy in Alphago is to pretrain an "intuition" by supervised learning -- they did deep learning on expert games. This is mainly useful for early game moves. For late game play they improved the supervised learning strategy with self-play learning methods specifically MCTS. In Chess they did the first part by basically stealing knowledge from "opening books" and they do the later game parts with AB pruning. In Poker they recently did this strategy (look for the "deepstack" poker bot from UAlberta) and were quite succesful. The self play algorithm later on is CFRM. The best poker bot (from CMU) uses CFRM for both the early and late game part with different levels of coarse-graining. |
|