| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bonoboTP 2385 days ago

> Why an imbalanced total ratio? Why not average length of heads? Average number of occurrences of "HT"? Frequency of alternations between H and T? Average fraction of times H appears counting only even tosses? Given the combinatorial explosion of possible criteria, I guarantee you I can find a simple-sounding criterion on which any desired string of fair tosses gets a low p-value.

Sure you can p-hack and people definitely do it. Still, good papers argue for any unconventional choice of what they mean by extreme.

> Let's say I think the string "HHTHT" is extremely indicative of conspiracy.

Then I as your peer-reviewer will say I require more justification for your premise. Usually what counts as more extreme is not up to each paper to define, but depends on the conventions of a field that were agreed upon by domain-level reasoning, so you don't always have so many degrees of freedom left (but still have some, that's why p-hacking is a hot topic.)

Again, you're arguing against p-hacking: coming up with your criterion for what counts as extreme after looking at your observation.

Indeed if we assume no p-hacking, things look much nicer. If for some reason you've for years argued on YouTube that there's a conspiracy to make the 5 coin tosses that person X will perform on live TV on this and this date to be biased towards HHTHT, and then it actually does end up being HHTHT on live TV, then I think it's fair to say we can reject the null hypothesis at the level of p=1/32. It doesn't mean we absolutely for eternity have rejected it, but I guess it's worth accepting a paper about your analysis and discussion (taking the analogy back to science). We're already accepting a 5% false positive ratio anyway.