Hacker News new | ask | show | jobs
by mturmon 4420 days ago
Everything you say is right, of course. Yet, I upvoted the story.

I thought it was indicative of a larger trend where crowdsourced data are used to illustrate a point. Like the Google flu trends articles, which have gone around HN at least twice, once when they were successful (https://news.ycombinator.com/item?id=5040204) and once when they were critiqued (e.g., https://news.ycombinator.com/item?id=7455307).

I work a lot with sampled data, and I have found that sampling issues can be some of the most difficult to appreciate and to quantify -- even for experts.

I guess it comes down to sampling from one distribution, P(x), when the situation you really care about samples according to a different distribution P'(x). If P is far from P', your conclusions from P can be arbitrarily bad. If you have an adversary moving P around deliberately, as here, it's even worse.

1 comments

Statistics experts are fewer and further between than experts in other fields who use statistics to justify their decisions, and the article shows how far off base most people are...after all the author conducted numerical analysis of the database and presents their findings as facts about data and includes a rough statistical comparison of the voting patterns of the lowest rated [called 'worst'] and the second lowest rated movies.

If there is an interesting statistical result it's that the movie's rating is entirely consistent with crowd sourced predictions. The theory is that 'wisdom of crowds' results directly from diversity among those making predictions.[1] In the case of the lowest rated movie, those making predictions were unusually homogeneous, and therefore an inaccurate prediction as to the quality is unsurprising.

Again, it's all in the interpretation, e.g. there's statistical evidence that a lot of morons ranked the The Matrix.

[1] Diversity Prediction Theorem: http://vserver1.cscs.lsa.umich.edu/~spage/ONLINECOURSE/predi...