CTR Metric¶
The goal of this post is to provide complete manual for deriving asymptotic distribution of Click-through rate (CTR). We are aimed at correct theoretical derivations, including verification of all assumptions. Since CTR is one of the primary metrics in A/B testing, we derive asymptotic distribution for absolute and relative difference in CTR between two variants - control and treatment.
Theory¶
We will use two core statistical tools - Central limit theorem and Delta method. There exists a great YouTube video from the Khan Academy explaining CLT, we greatly recommend it!
Central limit theorem¶
Let \(X_{1}, \dots, X_{n}\) be a random sample of size \(n\) - a sequence of \(n\) independent and identically distributed random variables drawn from a distribution of expected value given by \(\mu\) and finite variance given by \(\sigma^{2}\). Let denote sample average as \(\bar{X_{n}} = \frac{1}{n} \sum_{i=1}^{n} X_{i}\). Then holds
i.e. as \(n\) approaches infinity, the random variables \(\sqrt{n} \, \big( \bar{X_{n}} - \mu \big)\) converge in distribution to a normal \(\mathcal{N} (0, \sigma^{2})\). Another acceptable, but slightly vague formulations are
or
Delta Method - Univariate¶
Let assume any sequence of random variables \(\{T_{n}\}_{n=1}^{\infty}\) satisfying
and function \(g: \mathbb{R} \rightarrow \mathbb{R}\) which has continuous derivative around a point \(\mu\), i.e. \(g^{'}(\mu)\) is continuous. Then holds
Delta Method - Multivariate¶
Let assume any sequence of random vectors \(\{ \pmb{T}_{n} \}_{n=1}^{\infty}\) satisfying
and function \(g: \mathbb{R^{k}} \rightarrow \mathbb{R^{p}}\) which is continuously differentiable around point \(\pmb{\mu}\). Denote \(\mathbb{D}(x) = \frac{\partial \, g(x)}{\partial \, x}\). Then holds
CTR Definition¶
Without loss of generality, we can only focus on the control group with \(K\) users. Every user can see test screen multiple times, denoted by \(N_{i}, \, i = 1, \dots, K\). \(N_{i} \in \mathbb{N}\) is a discrete random variable with unknown probability distribution and finite variance. Next user can click on the screen. This action is denoted by binomial random variable \(Y_{i, j}, \, i = 1, \dots, K, \, j = 1, \dots, N_{i}\)
Click-through rate (CTR) is then defined as sum of all clicks devided by sum of all views
We want to derive asymptotic distribution for CTR. But we can not directly use central limit theorem since assumptions are violated. Random variables \(Y_{i, j}\) are not independent, nor identically distributed. We can use a little trick1 and simply reformulate CTR definition without any change:
where \(\bar{S} = \frac{1}{K} \sum_{i=1}^{K} S_{i}\) stands for average clicks per user and \(\bar{N} = \frac{1}{K} \sum_{i=1}^{K} N_{i}\) stands for average views per user. Users are independent of each other, random variables \(N_{i}, \, i = 1, \dots, K\) are independent and indetically distributed. For simplification we will assume that also random variables \(S_{i}, \, i = 1, \dots, K\) are independent and identically distributed, but it is only half true. \(S_{i}\) are independent, since users are independent of each other, but they are not identically distributed - \(S_{1}\) has some unknown discrete distribution on closed interval \([0, N_{1}]\), \(S_{2}\) has some unknown discrete distribution on closed interval \([0, N_{2}]\) and so on. Since \(N_{i} \in \mathbb{N}, \, i = 1, \dots, K\) are random variables and so \(P \big(N_{i} = N_{j} \big) \neq 1\) for \(i \neq j\). There exist other versions of central limit theorem which only assume independence, e.g. Lyapunov CLT.
Asymptotic Distribution of CTR¶
In this part we will derive asymptotic distributon for CTR. CTR is defined as fraction of two random variables - \(\bar{S}\) and \(\bar{N}\). We will proceed in three steps:
- We will use CLT and derive asymptotic distributions for both \(\bar{S}\) and \(\bar{N}\).
- We will use delta method - multivariate and derive asymptotic distribution for CTR.
Step 1¶
Since \(S_{1}, \dots, S_{K}\) is a random sample, from CLT we have
Since \(N_{1}, \dots, N_{K}\) is a random sample, from CLT we similary have
We can join both asymptotic normal distributions into two dimensional normal distribution
where \(\sigma_{SN}\) is covariance between random variables \(S\) and \(N\) defined as \(\sigma_{SN} = \mathrm{cov}(S,N) = \mathbb{E} \big[(S - \mu_{S})(N - \mu_{n})\big]\). Unknown covariance \(\sigma_{SN}\) can be easily estimated using following formula
Step 2¶
Now we apply multivariate delta method with a link function \(g: \mathbb{R}^2 \rightarrow \mathbb{R}\) defined as \(g(x, y) = \frac{x}{y}\). Gradient in point \((\mu_{S}, \mu_{N})\) equals to
Hence we have
Asymptotic distribution for CTR in treatment group with \(K\) observations equals to
$$ \sqrt{K} \, \bigg( \bar{Y} - \mu_{Y} \bigg) \stackrel{d}{\longrightarrow} \mathcal{N} \bigg(0, \, \frac{1}{\mu_{N}^2} \big(\sigma_{S}^2 - 2\frac{\mu_{S}}{\mu_{N}}\sigma_{SN} + \frac{\mu_{S}^2}{\mu_{N}^2} \sigma_{N}^2 \big) \bigg),$$ as \(K\) approaches infinity.
Difference Between Control and Treatment Group¶
We have derived asymptotic distribution fot control group. Analogously we would have derived asymptotic distribution for treatment group. Let's write them both once again
where \(\sigma_{A}^2\) and \(\sigma_{B}^2\) follows derivations right above (the complicated fomula) and \(K\) and \(L\) are number of observations in control and treatment group respectively.
Since we have again two asymptotic normal distributions, we can join them into two dimensional normal distribution
This time we used slightly different notation. We do need to be careful now. In general, we have different sample size (\(K \neq L\)). But on the other hand, in this case we assume there is no correlation between those two distributions, see zeros in covariance matrix.
In A/B testing we are usually interested in whether the difference between treatment and control group is statistically significant. We derive asymptotic distribution for both absolute and relative difference.
Absolute Difference¶
Absolute difference is easier. We will use multivariate delta method with simple link function \(g: \mathbb{R}^2 \rightarrow \mathbb{R}\) defined as \(g(x, y) = y - x\). Be aware of order \(x\) and \(y\) - it is \(y - x\), not \(x - y\). Gradient in point \((\mu_{A}, \mu_{B})\) equals to
and hence the result is
It can be written in following form
if \(K, L \rightarrow \infty\) and \(\frac{K}{L} \rightarrow q \in (0, \infty)\). \(S_{A}^2\) and \(S_{B}^2\) are sample variances.
Two sided asymptotic confidence interval for absolute difference equals to
where \(u_{1 - \alpha / 2}\) is \((1 - \alpha / 2)-\)quantile of normal distribution \(\mathcal{N}(0, 1)\).
P-value equals to
$$ p = 2 \big(1 - \Phi(|z|) \big), $$ where \(\Phi\) is distribution function of \(\mathcal{N}(0, 1)\) and \(z\) is observed value of test statistics \(Z_{K,L}\).
In practice is usually used Welch test, which uses t-distribution instead of normal distribution, with \(f\) degrees of freedom given as
Then two sided asymptotic confidence interval (with t-quantiles) for absolute difference equals to
P-value equals to
$$ p = 2 \big(1 - \text{CDF}_{t, f}(|z|) \big), $$ where \(\text{CDF}_{t,f}\) is cumulative distribution function of t-distribution with \(f\) degrees of freedom and \(z\) is observed value of test statistics \(Z_{K,L}\).
Relative Difference¶
To derive asymptotic distribution for relative difference we will again use multivariate delta method with a link function \(g: \mathbb{R}^2 \rightarrow \mathbb{R}\) defined as \(g(x, y) = \frac{y - x}{x}\). Be aware of order \(x\) and \(y\). Gradient in point \((\mu_{A}, \mu_{B})\) equals to
The result is
This can be rewritten in following form
if \(K, L \rightarrow \infty\) and \(\frac{K}{L} \rightarrow q \in (0, \infty)\). \(S_{A}^2\) and \(S_{B}^2\) are sample variances.
For unknown true relative difference \(\frac{\mu_{B} - \mu_{A}}{\mu_{A}}\) we can derive confidence interval. For simplicity let's denote the sample variance as \(\tilde{S}^2\), i.e.:
Finaly, the two sided asymptotic confidence interval for relative difference equals to
P-value equals to
$$ p = 2 \big(1 - \Phi(|z|) \big), $$ where \(\Phi\) is distribution function of \(\mathcal{N}(0, 1)\) and \(z\) is observed value of test statistics \(Z_{K,L}\).
Since we know, there is no straightforward approximation using t-quantiles, because there is no formula for degrees of freedom. In practise, we have huge amount of observations and both quantiles (normal and t-quantile) are very close to each other for large \(n\).