Hacker News new | ask | show | jobs
by Homunculiheaded 5052 days ago
As someone who went from top 5 to somewhere in the 60s in one contest, and reviewing results of past contests, I believe a lot of those small tweaks for slight gains in leader board scores end up penalizing the contestant for over-fitting. I saw a similar complaint to yours in a couple of forums but I do believe more often than not those small performance gains in the leader board actually hurt final scores.

Additionally for contests like Heritage Health [0], I believe the necessary goal of RMSLE of less than 0.4 is not considered possible (I came across this in the forums but never verified), so even if the contestants just inch past 0.4 it would still be something impressive.

0. https://www.heritagehealthprize.com/c/hhp/leaderboard

1 comments

It's not about small tweaks, it can be substantial additions to a model that improve its actual, out-of-sample performance. A popular method in these contests is ensembling, which involves building many sub-models and combining their scores into a single ensemble model. The netflix winner used ~100 sub-models in their ensemble, but the vast majority of the predictive power came from just three of those sub-models (can't find the source now).
Ah, I think I see what you are saying: essentially that the time it takes to build and tune the blending method and model selection for a 100+ ensemble gives you only a slightly better prediction than an appropriately choosen reasonably performant model at both a large computation and human labor cost?

What I was addressing was the issue that some users on Kaggle seemed frustrated that people were essentially submitting models with small parameter tweaks in order to marginally boost leader board scores. To these complaints I would argue that over-fitting is it's own punishment.

Thanks for the clarification!