Hacker News new | ask | show | jobs
by benchaney 1661 days ago
The issue is that there is no actual evidence that the data is being recorded improperly. Anyone can come up with a just so story to describe how the data could be corrupted, but writing such a blog post does not in fact put a burden on the people doing actual analysis to refute the claim.
1 comments

Not how it works. Burdens of proof work in courts of law, but you can't make a statistical analysis without addressing whether the data being used is sound.

You'll see a bunch of first-year undergrad papers rely on mediocre data, and then do all sorts of straight-from-textbook statistical analysis of the data, with p-values computed to 5 digits, and then discuss the data's mediocrity in a "Discussion" section. That's a common trope. But if the data is mediocre, you don't have a paper in the first place! Your p-values are junk, so are the abstract and conclusion, and you can't hand-wave it away by bringing up easily-addressed problems with the data in the Discussion section; should have addressed them in the first place. A great physics prof said: "any figure without an uncertainty is meaningless". So is any statistical analysis that doesn't validate the data it relies on.

But all that's besides the point, since the article I linked gives good reasons to believe the original government data does display issues in the first place.