| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by yummyfajitas 3649 days ago

I have plenty of criticisms of hypothesis testing and p-values. Nevertheless, if you choose to run that type of analysis, do it right - this means sticking with your analysis and not using weasel words like "almost statistically significant" when it doesn't come out the way you want. Incidentally, the real p-value is 11.075% since they ran two hypothesis tests and didn't adjust for multiple comparisons.

Your analysis might be right - if so, that's interesting. I'll take a closer look and write a followup piece if true - among other things glancing at your ROC curve suggests they are pretty close, and perform better for whites in some regions and better for blacks in others. But it's 7:30AM (pre-coffee) and I haven't looked closely yet.

But since PP did not do any of this, my criticism of them holds - they ran an NHST, got the wrong result, and then spouted a bunch of anecdotes instead of admitting that their analysis went against what they wanted to find.

1 comments

danso 3648 days ago

What's your response to the GP that you seem to have missed the part where false positive/negative rates in the notebook? In your blog post, you said this:

> Finally, the article includes a table of false positive probabilities (FPP) and false negative probabilities (FNP). This may or may not be evidence of bias - the authors would need to run a statistical test to determine that, which they don't. In fact, I can't even find the place in their R notebook where they did that calculation. Is this the result of bad statistics? Is it merely random chance? Who knows!

Looking at PP's Jupyter Notebook, the calculations seem to be performed at lines 50 onwards (if you're referring to the table that I think you're referring to).

FWIW, those "weasel words" you allege are in the writeup of the methodology, where the audience is expected to follow along and see how the 0.057 is calculated. I'm not sure how you're interpreting that calculation...My read is that it's not the bedrock from which all of the other analyses are based from. Where in the story do you see that particular calculation being used as the main (or even ancillary) thrust of the piece?

yummyfajitas 3648 days ago

Aggregate false positive/false negative rates don't prove anything. They can be caused by composition differences, which the analysis demonstrates the existence of.

But I'm sure they look convincing to ProPublica's readers who are not statistically sophisticated.

danso 3648 days ago

What's even more concerning is that these statistical noobs sometimes look to experts, unaware that these math geniuses may lack the literacy to read a plaintext notebook to the end.