| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rm999 5053 days ago
	Looks like a cool contest, I may check it out. What bothers me about modeling contests (I've taken part in several, it's my field) is they often reward putting 90% of your effort into extracting relatively small performance gains. For one thing it's not a realistic operating environment, there are usually many other factors more important than pure performance like upkeep, cost, speed, etc. This is why the netflix contest winning models couldn't go into production. The other issue I have is that people with other commitments (like a job) don't really stand a chance, it's usually very time-consuming to go from fifth place to first.

1 comments

Homunculiheaded 5052 days ago

As someone who went from top 5 to somewhere in the 60s in one contest, and reviewing results of past contests, I believe a lot of those small tweaks for slight gains in leader board scores end up penalizing the contestant for over-fitting. I saw a similar complaint to yours in a couple of forums but I do believe more often than not those small performance gains in the leader board actually hurt final scores.

Additionally for contests like Heritage Health [0], I believe the necessary goal of RMSLE of less than 0.4 is not considered possible (I came across this in the forums but never verified), so even if the contestants just inch past 0.4 it would still be something impressive.

0. https://www.heritagehealthprize.com/c/hhp/leaderboard

link

rm999 5052 days ago

It's not about small tweaks, it can be substantial additions to a model that improve its actual, out-of-sample performance. A popular method in these contests is ensembling, which involves building many sub-models and combining their scores into a single ensemble model. The netflix winner used ~100 sub-models in their ensemble, but the vast majority of the predictive power came from just three of those sub-models (can't find the source now).

link

Homunculiheaded 5052 days ago

Ah, I think I see what you are saying: essentially that the time it takes to build and tune the blending method and model selection for a 100+ ensemble gives you only a slightly better prediction than an appropriately choosen reasonably performant model at both a large computation and human labor cost?

What I was addressing was the issue that some users on Kaggle seemed frustrated that people were essentially submitting models with small parameter tweaks in order to marginally boost leader board scores. To these complaints I would argue that over-fitting is it's own punishment.

Thanks for the clarification!

link