| HN Mirror

1. GA is very limited as a data source because of sampling and the fact that they don't expose variance. So if using GA, we only support simple binomial metrics, count data (assuming Poisson distribution), and duration data (assuming exponential distribution). For SQL data sources and non-parametric data, we currently rely on the CLT and treat the sampling distribution as Normal. There's a good article that goes over the stats in more detail (Itamar, the author, wrote our stats engine) - https://towardsdatascience.com/how-to-do-bayesian-a-b-testin...

2. We have a minimum sample size threshold before we run any statistics on the data. To your point, we don't want to say something is "significant" if it's 5 conversions vs 1. This is one area we're looking to improve with better heuristics. We can't completely take the human out of the loop, but we can help give them all the info they need to make the best decision. On that front, we do show Bayesian expected loss (risk) and credible intervals in addition to just the "chance to beat control".