|
|
|
|
|
by brudgers
4420 days ago
|
|
Data gone Wrong Did they start hanging out with the bad kids, take up cigarettes, drinking, gambling only to progress to crack and burglaries one of which ended with our Data shooting a home owner who returned unexpectedly? I guess I don't understand what data is. I always thought it was a set of values. And I always thought that the problem when using data was in the interpretation, and that a prudent consumer of data would always be careful to distinguish between a random sample and self-selecting sample when drawing conclusions, and then would only state conclusions couched in the language of statistical inference. Leaving aside the question of why I should give a fuck about this supposed outrage, why does the author expect there to be a strong correlation between movie quality and the ratings on a website devoted to providing entertainment by having users rate movies? When The Matrix is purported to be better a better movie than Lawrence of Arabia, the problems of interpretation are systemic. |
|
I thought it was indicative of a larger trend where crowdsourced data are used to illustrate a point. Like the Google flu trends articles, which have gone around HN at least twice, once when they were successful (https://news.ycombinator.com/item?id=5040204) and once when they were critiqued (e.g., https://news.ycombinator.com/item?id=7455307).
I work a lot with sampled data, and I have found that sampling issues can be some of the most difficult to appreciate and to quantify -- even for experts.
I guess it comes down to sampling from one distribution, P(x), when the situation you really care about samples according to a different distribution P'(x). If P is far from P', your conclusions from P can be arbitrarily bad. If you have an adversary moving P around deliberately, as here, it's even worse.