Hacker News new | ask | show | jobs
by jrdorn 1779 days ago
1. GA is very limited as a data source because of sampling and the fact that they don't expose variance. So if using GA, we only support simple binomial metrics, count data (assuming Poisson distribution), and duration data (assuming exponential distribution). For SQL data sources and non-parametric data, we currently rely on the CLT and treat the sampling distribution as Normal. There's a good article that goes over the stats in more detail (Itamar, the author, wrote our stats engine) - https://towardsdatascience.com/how-to-do-bayesian-a-b-testin...

2. We have a minimum sample size threshold before we run any statistics on the data. To your point, we don't want to say something is "significant" if it's 5 conversions vs 1. This is one area we're looking to improve with better heuristics. We can't completely take the human out of the loop, but we can help give them all the info they need to make the best decision. On that front, we do show Bayesian expected loss (risk) and credible intervals in addition to just the "chance to beat control".

1 comments

Brilliant, thank you.

Can you use the system to analyse results of tests it didn't run? ie. If I run tests using some SAAS that only supports frequentist stats could I use your system as a bayesian analysis backend?

Yes. As long as the variation assignment data and success metrics are in a supported data source (SQL, GA, or Mixpanel currently), it can be queried and analyzed in Growth Book.