|
|
|
|
|
by algo_trader
2190 days ago
|
|
Its great when you can indeed iterate without 100s of GPU/hrs. Are there any papers/comparisons/tradeoffs on when GBDT predictive-power plateaus compared to a NN? EDIT: with self play you can trade-off a cpu budget for both the GBDT depth, a NN depth, and the roll-out depth - which is super interesting |
|
None specifically that I know of, but I haven't searched.
"Shallow learning" GBDTs can do pretty well on MNIST (https://www.kaggle.com/c/digit-recognizer/discussion/61480), getting 98%+ accuracy compared to the 99%+ of NNs. So I figured if they can handle MNIST, they can probably handle connect 4, and would be useful to explore the self-play training efficiency aspects of AlphaZero (at orders of magnitude less compute time/cost)
> with self play you can trade-off a cpu budget for both the GBDT depth, a NN depth, and the roll-out depth - which is super interesting.
Definitely. It'll be interesting to see if a deeper MCTS search with a less powerful model can do pretty well. I'm still fairly ignorant about the MCTS literature, but I've definitely seen MCTS married to other value/policy models (linear regressions, for e.g.) that used large numbers of playouts years before Alpha Go came out. Those didn't work out, so seems like the DL aspect of Alpha Zero is somewhat essential to be able to learn games as complex as Go.