Hacker News new | ask | show | jobs
by mej10 5052 days ago
I am really interesting in ML, but have only recently been diving into it. I have watched all of the videos for Andrew Ng's Coursera course (and most of the programming exercises), but just looking over some of the Kaggle contests I think I would be quickly out of my depth.

Would I be wasting my time attempting these with such a basic level of knowledge?

2 comments

If you read through the past winners you'll find that in many cases a very simple model will win. I believe one of the winners that posted a blog post had pretty much the background you describe.

When I started I was in a similar position to you and just wanted to see if I could even tread water with some of the really knowledgeable members of the community. I ended up placing in the top 5 for one of the contests I was in (with btw a really simple model).

They usually give you some starter code in either R or Python which will give you the results for a benchmark, start there and then use cross-validaton to see if you can beat that bench mark, and if you do submit. It's very addictive and you'll come away knowing a lot more than you started with.

Awesome, thanks for the info. I am checking out some of the benchmarks now.

Why do you think it is that simple models often win? Is it due to the experts no participating or is there a lot more low-hanging fruit than I previously thought? Or just that simple models are easier to use and reason with for humans and thus easier to get right.

If you look at some of the bios of the top 50 kaggler's there's some pretty impressive backgrounds there, and they participate heavily. So I don't think that's the reason.

I know for my own beginner mistakes, it's a big error to try something out-of-the-box and immediately try to get better cv scores by creating much more complicated solutions.

The truth is a lot of work has been put into any standard implementation of an SVM, RandomForest etc (and even more work has been put into the theory behind those algorithms). Since I haven't come in 1st in any competition and am not a ML expert I don't think I can give you the correct strategy to win. But I can say as a general trend, all of my attempts to create non-standard complicated models did terribly, and many of the decisions I made based on research into fixing a particular problem in a known solution seemed to be better performing (i.e. "How to deal with imbalanced classification problems with a RF?" type questions)

I work at Kaggle.

In many cases where simple models win, there's some insight into the data that the winner found - engineered a new feature, or noticed a pattern and appropriately tuned a particular method. Where those insights exist, they often overshadow any gains by super-sophisticated ML techniques.

Makes sense. Thanks!
You might not do particularly well (or you might!), but diving into a Kaggle competition is one of the better ways to learn a lot about machine learning very quickly.