|
|
|
|
|
by LeifCarrotson
2311 days ago
|
|
A statistic for everything, but little statistical or scientific literacy: so much data dredging goes on. When someone floats a number like "51 breaking balls with zero missed swings" or "24 straight curveballs" it's never presented with the rate at which this would be expected to occur in the pseudorandom/typical case. There are close to a million pitches thrown in each season. If someone flipped a coin for every pitch in the 2000s, they would probably get a string of 24 head and a string of 24 tails. Given the number of pitches that have been thrown, and the human tendency to stick with what's working, the only reason that there wouldn't be 24 of one pitch thrown in a row is that they'd deliberately change it up. |
|
I had to dig into this for work, and doing statistics on runs is surprisingly interesting. Suppose you've got a sequence of $n$ events, each of which 'succeeds' with probability $p$. The expected length of the longest run is approximately $\log_{1/p}(n*(1-p) + 0.577 \ln(1/p) - 1/2$. For a fair coin with $p$ = 0.5, this reduces to log_2(n) - 2/3, which is about 19 for one million events. Amazingly, the variance only weakly depends on n, but is about 2 for p=0.5.
Thus, you're probably not going to see a 24 head run in 1M events. I'm excited I got to use this information, as the project I learned it for was a total bust.
More here: Shilling (1990, College Math. J.) https://www.csun.edu/~hcmth031/tlroh.pdf