| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by algo_trader 2190 days ago

Its great when you can indeed iterate without 100s of GPU/hrs.

Are there any papers/comparisons/tradeoffs on when GBDT predictive-power plateaus compared to a NN?

EDIT: with self play you can trade-off a cpu budget for both the GBDT depth, a NN depth, and the roll-out depth - which is super interesting

1 comments

cgreerrun 2190 days ago

> Are there any papers/comparisons/tradeoffs on when GBDT predictive-power plateaus compared to a NN?

None specifically that I know of, but I haven't searched.

"Shallow learning" GBDTs can do pretty well on MNIST (https://www.kaggle.com/c/digit-recognizer/discussion/61480), getting 98%+ accuracy compared to the 99%+ of NNs. So I figured if they can handle MNIST, they can probably handle connect 4, and would be useful to explore the self-play training efficiency aspects of AlphaZero (at orders of magnitude less compute time/cost)

> with self play you can trade-off a cpu budget for both the GBDT depth, a NN depth, and the roll-out depth - which is super interesting.

Definitely. It'll be interesting to see if a deeper MCTS search with a less powerful model can do pretty well. I'm still fairly ignorant about the MCTS literature, but I've definitely seen MCTS married to other value/policy models (linear regressions, for e.g.) that used large numbers of playouts years before Alpha Go came out. Those didn't work out, so seems like the DL aspect of Alpha Zero is somewhat essential to be able to learn games as complex as Go.

link

jonath_laurent 2190 days ago

I am wondering if your idea of using GBDTs in combination with AlphaZero might not be most influential in areas where no neural network architecture is known to provide the right inductive bias for the problem at hand.

I think neural models are pretty unbeatable in many classic RL environments because convolutional neural networks are REALLY good at learning visual representations. In some sense, I suspect that the great success of AlphaGo Zero comes in big part from the fact that it really makes sense to analyze a Go board as a 2D image using convolutional networks: convolutional networks provide the right inductive bias for the problem of learning to play Go.

However, there are tasks where neural network are not as good, such as symbolic manipulation tasks (I am in a good position to know this as I'm doing research in the area of automated theorem proving). I would be very curious to see how your approach fares for those tasks.

link

cgreerrun 2189 days ago

> because convolutional networks provide the right inductive bias for the problem of learning to play Go.

Agreed, seems like there will be a set of environments that GBDT won't work on and a CNN will. I'm only mildly familiar with CNNs, but one thing I wanted to try was to add convolutional features to a GBDT some way. I'm sure it's been tried and worked/failed before, but it would be an interesting exercise (at least for me) to learn why/if it works.

It'd be valuable if there was a way to get 95% of the benefits of CNNs (scale/position invariance, etc.) with less compute, even if the accuracy was lower. Right now the GBDT is probably making a ton of redundant split points for connect 4 to re-detect the same position-shifted pattern. There's gotta be a somewhat generalizable way to at least make that more efficient.

> such as symbolic manipulation tasks (I am in a good position to know this as I'm doing research in the area of automated theorem proving). I would be very curious to see how your approach fares for those tasks.

Definitely! I really want to see a non-game application. I think most people think of AlphaZero as "a cool technique to make SOTA board game agents", but to me it excites me because it seems like a very generalizable technique that'll be applied to other important types of problems.

I know as much as Jon Snow about symbolic manipulation, but if you have an example symbolic manipulation problem that can be shoehorned into an environment (state, actions, transitions, rewards) I'd be down to code it up and see how it does.

link