Hacker News new | ask | show | jobs
by psandersen 2188 days ago
This is really interesting, thanks for sharing!

I've been thinking about extensions to decision tree models that could get the benefits of NNs and it seems like there are a few ideas floating around.

For example; Probabilistic Random Forests have some really interesting properties for noisy datasets, e.g. "The PRF accuracy decreased by less then 5% for a dataset with as many as 45% misclassified objects, compared to a clean dataset." - https://arxiv.org/abs/1811.05994

PRF's might be a natural fit for RL, especially methods using monte carlo tree search.

Speculating here as I'm not adequately familiar with stochastic calculus, but intuitively it seems like probabilistic decision trees could be made differentiable since the hard decision threshold in a tree could be a turned continuous (i.e. every split is logistic regression), which might enable some really interesting applications. I personally dream of being able to cleanly integrate decision trees and tools from NNs in something like pyro for a fully Bayesian model.

2 comments

A fast, powerful bayesian model seems like it would be a game-changer. PUCT (the heart of the AlphaZero MCTS that decides which action to choose) really seems setup to model the action choices as a multinomial bayesian inference problem (it already updates the action priors with Dir noise).

Thanks for the link! I don't really know anything about the world of probabilistic trees. I'll check it out.

The only bayesian approach to decision trees I'm familiar with is BART (https://projecteuclid.org/download/pdfview_1/euclid.aoas/127...). I haven't used them, but I'm guessing because it uses MCMC to update the params it's not super fast. I've seen them used in causality applications for partial dependency plots where it's convenient to convey the certainty of a variable's effect.

Check out the SoftBART method, I think there are interesting optimizations possible there. XBart also looks promising as an approach.
There appears to be a LogitBoost tree that does what you say, if I understand you correctly.

[1] https://en.wikipedia.org/wiki/LogitBoost

Thanks for the reference, from a quick read it seems LogitBoost is a booster for ensembling models under a logistic loss.

I meant that the splitting point in a node in the decision trees that make up a random forest is itself a random variable that follows a distribution. Because its a smooth function (i.e. probability of splitting is 50% at the point and rises/falls smoothly) it should in principle be differentiable* so that the whole model can be trained by SGD and/or fit into an end to end learning pipeline with convolutional layers etc.

*Where I'm hazy, is how can a smooth probability function be differentiable when sampling. I'm brainstorming in the open here, will do some reading on stochastic neural networks.