Hacker News new | ask | show | jobs
by ishansharma 2973 days ago
Hi,

Author here. Thanks for the links. I originally planned to try a bigger dataset but didn't want to make the article too long, so left out the optimizations.

I had to do the same exercise for my ML class and had similar results. Random pruning (mostly) worked great for me.

IMHO, While DTs do overfit a lot, they are a great starting point for beginners because of their (relative) simplicity. Better to start light and then introduce the math heavy neural nets and SVMs.

3 comments

I don’t know why but I felt regression to be a very easy starting point. y = mx + c is just high school math.
Thanks for stopping by the thread. I'm doing my thesis in regression trees and appreciate any love trees get in the ML space. I quite liked the article, but my only comment is that the pseudo-code for ID3 was a little hard for me to read formatting wise.
Ah, I guess I could have used a Gist for that. Thanks for the suggestion, let me see if I can update the article.
Hey, great explanation of decision trees. I might recommend, instead of going for a bigger dataset, maybe following it up by adding bagging or boosting.

Fairly often people writing tutorials jump straight to Random Forest or Gradient Boosting, and those are great to use but maybe too big a conceptual leap to understand straight away if your theoretical background is weak.

Thank you. I will consider doing that. I will have to create or find a relevant dataset that's small enough for those concepts but I guess that'll help rather than jumping straight to random forest or gradient boosting.