Hacker News new | ask | show | jobs
by _pastel 1419 days ago
It's baffling to me how little research attention there has been to improving tree-based methods, considering their effectiveness.

For example, LightGBM and XGBoost allow some regularization terms, but the variance/bias is still mostly controlled by globally setting the max depth and max node count (and then parameter searching to find good settings).

Surely there must be more powerful and sophisticated ways of deciding when to stop building each tree than counting the number of nodes? If this was neural nets there would be a hundred competing papers proposing different methods and arguing over their strengths and weaknesses.

I'm not sure whether the problem is that neural nets are just fundamentally more sexy, or that in order to make SOTA improvements in GBMs you need to dive into some gnarly C++. Probably both.

4 comments

Why do you think there has been little research attention? Time was, 'machine learning' was little but tree-based methods (and that was how they distinguished themselves from 'AI'). Go look at Breiman's CV or random conference proceedings. Or as tree-based method proponents love to point out, pretty much everyone on Kaggle up until recently used trees for everything non-image-based; that's a ton of effort invested in tweaking trees. And there were hardware efforts to accelerate them (I recall MS talking about how they were investing in FPGAs for MS Azure to run trees better), so 'GPUs' isn't an excuse.
I think people throwing algorithms at problems at Kaggle or combining them is not the kind of research, which the GP was referring to.

Limiting a tree by its depth is a very general global parameter for a tree. One could try to use any kind of criteria for deciding when to stop making more child nodes in a tree, depending on what information is locally available and that depends on how the tree algorithm is actually implemented. So people doing Kaggle challenges would have to dig into the source code of the tree implementation, then change things there, to modify locally available knowledge, in order to allow for more fine grained decision making at each node.

That is only the constructive side of things, when the tree is created. Even more powerful is the post processing / destructive / prunning side of things, because theoretically the whole tree structure can be taken into account, when deciding what branch to cut.

I think the GP is referring to research in the area of what other useful things one can come up with here. As far as I know, these are not the usual things people do in Kaggle challenges. Correct me if I am wrong.

Because anytime I search for literature on basic tweaks to the structure of decision trees, I find nothing.

Another example: modern GBM implementations all use binary trees. How would they perform with ternary trees? Or k-way trees for larger k plus some additional soft penalty that encourages minimizing the number branches, unless the information gain is really worth it?

(You can simulate ternary trees with binary, but the splitting behavior is different because ternary can more easily identify good splitting regions in the middle range of the histogram values.)

This seems like such a basic structural question, but the only relevant search result was this downvoted Stack Exchange question from 5 years ago: https://stats.stackexchange.com/questions/305685/ternary-dec...

There are lots of papers on ternary trees in old-school contexts like Ternary Decision Diagrams etc., but nothing relevant to the context of modern tree ensemble performance. Or maybe I'm just bad at searching?

(I implemented this and saw a small accuracy increase from ternary on large datasets, but much worse training speed because you get less mileage from the histogram subtraction trick. Maybe the accuracy would be even better with a more clever choice of soft penalty.)

> LightGBM not improving, meanwhile 8-figure budgets builds GPU and auto-logins..

My take? management agenda to build plug-and-play researchers (humans on jobs), rather than domain specialists. DeepLearning fits that description with all-plumbing, all-the-time.. domain specialists want graduate school, weekends and health benefits..

Is this supposed to be an ironic take on the "plug-and-play HN comment blaming cost cutting management for everything"?

Because it's not this.

Deep learning researchers are specialists in their specific technologies. For example recent advances in transformers not withstanding, it takes quite a long time for say someone who knows how to build deep learning models on images using CNNs to come up to speed on how build good natural language processing models.

There are a fair number of papers (start with Dart/dropout, Bart (bayesian sampling of the whole gbm) but they start to look like global optimization problems and part of the reasons trees work so well is that the local greedy optimization can be made super fast on modern cpu caches.

So even if you can fit a more compact forest that performs well through clever regularization its usually better/faster in practice to grow more simple trees with more randomization and let overfitting average out.

I think part of the problem is that the upper bound on neural nets, as far as we can tell, might very well be general intelligence, and things like self-driving cars, and other nearly magical use-cases that seem within reach. Whereas tree based models, for a series of reasons, many related to scaling, don't offer that feeling of limitless potential.
Maybe. But then again we often try to solve very specific problems, which are very far from requiring anything close to a general intelligence. Heck, a general intelligence might even be bad at things, just like humans are not as good as computers at certain things.
There's a science-fiction story somewhere in there. Imagine an alien planet where brains evolved using a structure completely different than a neural net... some sort of classification model where they were incredibly efficient inside their domain, but broke unpredictably when brought outside of it.