Hacker News new | ask | show | jobs
by vlovich123 1234 days ago
Hmmmm… I have code where I’m randomly sampling an exponential function and even thousands of samples are insufficient to pass chi-squared tests at 95% accuracy that the observed distribution matches my expected ground truth exponential function. The reason? Chi-squared needs 5 samples at the tail which has an effective probability of 0. And if I try to flip it and say “run the experiment with 500 samples 100 times but verify the observed matches the expected with a 5% error”, I’ll still see more than 5 runs that fail this.

Is there something special about exponential functions or is it just my misunderstanding of statistics/calculus at play here for doing this correctly? I assume it’s the latter but I haven’t figured out what I’m doing wrong.

3 comments

I'm not sure what exactly you're doing--binning the observations into ranges to run the chi square test?

In any case, it sounds like maybe it falls under the "if you're interested in things that are rare" paragraph in my post above. You can always design statistics that are arbitrarily hard to estimate. The things that we're typically interested in estimating in real life, though--averages, proportions, and similar--are typically estimable with reasonable sample sizes.

> thousands of samples are insufficient to pass chi-squared tests at 95% accuracy that the observed distribution matches my expected ground truth exponential function

It doesn't sound like your test statistic is chi-squared distributed, in which case it's not surprising that your samples fail the test, and sampling more just makes the failure more obvious.

> Is there something special about exponential functions

It's not that exponential functions are special; almost any other function would likely also fail the test. Rather, they're insufficiently special. The chi-squared distribution with k degrees of freedom arises from the sum of k independent standard normal-distributed random variables. Some computations (e.g. sample variance of k draws from a normal distribution) can be expressed using such a sum, but others (e.g. sample variance of k draws from an exponential distribution) cannot.

You'll need to switch to a different test statistic and use that test statistic's distribution (which is unlikely to be chi-squared) to compute your confidence intervals.

Which test statistic should I use? I’ve been trying to figure this out but have been unsuccessful in finding it.
If you can post a detailed explanation of what exactly you're trying to do , and/or your code, I'm happy to try to help you sort it out.
I have a random number function that has an exponentially decreasing probability of generating a given integer within [0, R). So for example, if the range of values is [0, 100), 99 has a 50% probability of being generated, 98 has a 25% chance, and so on.

I’m trying to confirm that if I run this function N times (let’s say 1000), that the frequency of the numbers generated match the expected distribution.

Ok, so the big issue is that statistical tests like the chi-squared test are not designed to show that a sample matches a certain distribution. Statistical tests are designed to show the opposite--"this sample does not match that distribution".

If the sample matches the distribution, by design the p-value is going to be uniformly distributed--i.e. a p-value of 0.01 is equally likely as a p-value of 0.99.

It's the fact that you need rare samples. The power of sample size is that you can see finer details relative to the fully zoomed out view. If you are interested in an effect which is rare or want to find a small difference between two effects, then you will potentially need a much larger sample size. (For the extremes of this, see the truely gigantic number of samples (trillions+) that are taken in high-energy physics experiments like the LHC: they are looking for very small differences in very rare events. This is also related as to why standards for statistical tests are much higher in this field)
I don’t actually care about the tails. I’m fine cutting off the comparison and treating sufficiently rare events as having an expected value of 0. And indeed, the bins that show up with “errors” (ie deviating > 5%) are the ones where events are reasonably expected. The tails are indeed always within 5% of expected.