Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> thousands of samples are insufficient to pass chi-squared tests at 95% accuracy that the observed distribution matches my expected ground truth exponential function

It doesn't sound like your test statistic is chi-squared distributed, in which case it's not surprising that your samples fail the test, and sampling more just makes the failure more obvious.

> Is there something special about exponential functions

It's not that exponential functions are special; almost any other function would likely also fail the test. Rather, they're insufficiently special. The chi-squared distribution with k degrees of freedom arises from the sum of k independent standard normal-distributed random variables. Some computations (e.g. sample variance of k draws from a normal distribution) can be expressed using such a sum, but others (e.g. sample variance of k draws from an exponential distribution) cannot.

You'll need to switch to a different test statistic and use that test statistic's distribution (which is unlikely to be chi-squared) to compute your confidence intervals.



Which test statistic should I use? I’ve been trying to figure this out but have been unsuccessful in finding it.


If you can post a detailed explanation of what exactly you're trying to do , and/or your code, I'm happy to try to help you sort it out.


I have a random number function that has an exponentially decreasing probability of generating a given integer within [0, R). So for example, if the range of values is [0, 100), 99 has a 50% probability of being generated, 98 has a 25% chance, and so on.

I’m trying to confirm that if I run this function N times (let’s say 1000), that the frequency of the numbers generated match the expected distribution.


Ok, so the big issue is that statistical tests like the chi-squared test are not designed to show that a sample matches a certain distribution. Statistical tests are designed to show the opposite--"this sample does not match that distribution".

If the sample matches the distribution, by design the p-value is going to be uniformly distributed--i.e. a p-value of 0.01 is equally likely as a p-value of 0.99.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: