| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ishansharma 2973 days ago

Hi,

Author here. Thanks for the links. I originally planned to try a bigger dataset but didn't want to make the article too long, so left out the optimizations.

I had to do the same exercise for my ML class and had similar results. Random pruning (mostly) worked great for me.

IMHO, While DTs do overfit a lot, they are a great starting point for beginners because of their (relative) simplicity. Better to start light and then introduce the math heavy neural nets and SVMs.

3 comments

abhishekjha 2973 days ago

I don’t know why but I felt regression to be a very easy starting point. y = mx + c is just high school math.

link

mliswhat 2973 days ago

Thanks for stopping by the thread. I'm doing my thesis in regression trees and appreciate any love trees get in the ML space. I quite liked the article, but my only comment is that the pseudo-code for ID3 was a little hard for me to read formatting wise.

link

ishansharma 2973 days ago

Ah, I guess I could have used a Gist for that. Thanks for the suggestion, let me see if I can update the article.

link

jrumbut 2972 days ago

Hey, great explanation of decision trees. I might recommend, instead of going for a bigger dataset, maybe following it up by adding bagging or boosting.

Fairly often people writing tutorials jump straight to Random Forest or Gradient Boosting, and those are great to use but maybe too big a conceptual leap to understand straight away if your theoretical background is weak.

link

ishansharma 2972 days ago

Thank you. I will consider doing that. I will have to create or find a relevant dataset that's small enough for those concepts but I guess that'll help rather than jumping straight to random forest or gradient boosting.

link