| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Aqwis 3294 days ago
	Does anyone know why most machine learning libraries (notably scikit-learn) implement trees and ensembles of trees based on the CART algorithm? It seems like using other types of trees (See5, MARS) particularly in ensembles could possibly have advantages as these types of trees were specifically developed as improvements to CART/C4.5.

2 comments

lackadaisicall 3294 days ago

> Does anyone know why most machine learning libraries (notably scikit-learn) implement trees and ensembles of trees based on the CART algorithm?

This is just my theory.

Because it was the first tree based algorithm and Leo Brieman really did market it out. He even trademark Random Forest.

Kinda like how XGboost is doing right now.

My professor is also trying to market his version out too. If I get around finishing my thesis. His algorithm problem is that it isn't ported to any language at all. It's written years ago in a C and he's not a programmer.

I'd imagine it is the same with the other algorithms. Leo on the other hand is a CS major on top of a Stat major.

Also there are tons of regression algorithms out there that can be made into trees (their fully nonparametric counter part).

But in the end linear regression is the most popular next to logistic iirc. There's survival trees and BART bayesian trees which is in it's infancy.

link

joe636434 3294 days ago

A professor who invents his own version of tree but can not program. Seriously. Is this common in academic circles where a computer professor who can not program ?

link

nerdponx 3294 days ago

AFAIK:

- ID3, CART, C4.5, and C5 are all conceptually equivalent "recursive partitioning" algorithms, and CART is sometimes used as a catch-all term instead of the phrase "recursive partitioning".

- MARS requires two passes over the data

- CART is "dumber" than CHAID, which could be seen as a benefit for "high-volume" ensembles like RFs and GBMs. One blogger writes that CHAID is a better explanatory/exploratory tool, while CART is a better prediction tool: http://www.bzst.com/2006/10/classification-trees-cart-vs-cha...

Some other comparisons:

https://stats.stackexchange.com/a/61245/36229

https://stackoverflow.com/q/9979461/2954547

So the answer is that CART specifically isn't used everywhere. Recursive partitioning is used everywhere, mostly because it is simple.

link