Hacker News new | ask | show | jobs
by scuba_man_spiff 3815 days ago
Your comment hits the nail on the head here.

Standard statistical tests used in a/b testing are based on one check. If someone is checking repeatedly on a test until they get a 'significant' result, your chance of getting a getting a false positive is many X the stated significance.

Best practice - set a pre-defined end, and one or two defined early check-in points where only make an early call if result is overwhelmingly significant or if the business has fallen off a cliff.

1 comments

That would be best practice if you insist on using null hypothesis significance testing and only wanted to use classical frequentist statistics. We really can do much better these days with multi-armed bandits, and by focusing on effect sizes and credible intervals rather than a yes/no answer to a hypothesis.
Sounds like I need to update my stats knowledge! Do you happen to know of a good place to start learning about today's state of the art?