Hacker News new | ask | show | jobs
by gwern 1422 days ago
Why do you think there has been little research attention? Time was, 'machine learning' was little but tree-based methods (and that was how they distinguished themselves from 'AI'). Go look at Breiman's CV or random conference proceedings. Or as tree-based method proponents love to point out, pretty much everyone on Kaggle up until recently used trees for everything non-image-based; that's a ton of effort invested in tweaking trees. And there were hardware efforts to accelerate them (I recall MS talking about how they were investing in FPGAs for MS Azure to run trees better), so 'GPUs' isn't an excuse.
2 comments

I think people throwing algorithms at problems at Kaggle or combining them is not the kind of research, which the GP was referring to.

Limiting a tree by its depth is a very general global parameter for a tree. One could try to use any kind of criteria for deciding when to stop making more child nodes in a tree, depending on what information is locally available and that depends on how the tree algorithm is actually implemented. So people doing Kaggle challenges would have to dig into the source code of the tree implementation, then change things there, to modify locally available knowledge, in order to allow for more fine grained decision making at each node.

That is only the constructive side of things, when the tree is created. Even more powerful is the post processing / destructive / prunning side of things, because theoretically the whole tree structure can be taken into account, when deciding what branch to cut.

I think the GP is referring to research in the area of what other useful things one can come up with here. As far as I know, these are not the usual things people do in Kaggle challenges. Correct me if I am wrong.

Because anytime I search for literature on basic tweaks to the structure of decision trees, I find nothing.

Another example: modern GBM implementations all use binary trees. How would they perform with ternary trees? Or k-way trees for larger k plus some additional soft penalty that encourages minimizing the number branches, unless the information gain is really worth it?

(You can simulate ternary trees with binary, but the splitting behavior is different because ternary can more easily identify good splitting regions in the middle range of the histogram values.)

This seems like such a basic structural question, but the only relevant search result was this downvoted Stack Exchange question from 5 years ago: https://stats.stackexchange.com/questions/305685/ternary-dec...

There are lots of papers on ternary trees in old-school contexts like Ternary Decision Diagrams etc., but nothing relevant to the context of modern tree ensemble performance. Or maybe I'm just bad at searching?

(I implemented this and saw a small accuracy increase from ternary on large datasets, but much worse training speed because you get less mileage from the histogram subtraction trick. Maybe the accuracy would be even better with a more clever choice of soft penalty.)