|
|
|
|
|
by crystal_revenge
691 days ago
|
|
> And pulling granular data into a Python environment and fitting a regression is much less efficient than calculating aggregated statistics like mean and variance. This is not true. You almost never need to perform logistic regression on individual observations. Consider that estimating a single Bernoulli rv on N observations is the same as estimate a single Binomial rv for k/N. Most common statistical software (e.g. statsmodels) will support this grouped format. If all of our covariates a discrete categories (which is typically the case for A/B tests) then you only need to regression on the number of examples equal to the number of unique configurations of the variables. That is if you're running an A/B test on 10 million users across 50 states and 2 variants you only need 100 observations for your final model. |
|
Interesting, I didn't know this about statsmodels. But maybe documentation a bit misleading: "A nobs x k array where nobs is the number of observations and k is the number of regressors". Source: https://www.statsmodels.org/stable/generated/statsmodels.gen...
I would be grateful for the references on how to apply statsmodels for solving logistic model using only aggregated statistics. Or not statsmodels. Any references will do.