Statistical nitty-gritty
The fixed-sample, sequential, and Bayesian methods all start from the same place: estimating the mean () and variance () of the lift; the former is an estimate of the size of the treatment effect (as a percentage of the baseline metric value), and the latter is an indication of how reliable that estimate is for predicting what will happen if you ship the treatment.
Once we have an estimate of the observed lift (both its mean and variance), we can use that to construct a confidence interval that describes the plausible values for the true lift.
Estimating lift
First, some notation:
-
For a given metric, we observe for each subject in the control group of size , and for each subject in the treatment group of size . The set of all observations in the control group is , and for the treatment group it is .
-
The true population mean (which we don't know!) of the metric among the control group is , and among the treatment group it is . Similarly, the true (unobserved) population variance is and
-
and are the averages of and across all subjects in the control group and treatment group, respectively:
-
and are the sample variances of and for the control group and treatment group, respectively: