Ad-hoc A/B test evaluation using Ep-Stats¶

This is a simplified version of general manual Using Ep-Stats in Jupyter. In this case we assume simple DataFrame at the input. It should contain aggregated data of an A/B test in a wide format.

Next we define metrics and checks we are interested in. Finally, we evaluate the experiment and nicely formate results.

Input DataFrame Example¶

Mind that you need to prepare experiment data on your own. Following example is only illustrative.

You should be aware of following assumptions:

First two columns must contain name of the experiment and variants. Names of the columns may vary.
It is necessary to download squared values for continuous metrics like Revenue per Mille (RPM). If you forget to do it, results will be wrong and misleading. Ep-Stats will not warn you about this issue!

In [2]:

Copied!

# This is only example to show required format of the input DataFrame
# You have to prepare aggregated data on your own, e.g. using SQL

from epstats.toolkit.testing import TestData

goals = TestData.load_goals_simple_agg()
goals
# This is only example to show required format of the input DataFrame
# You have to prepare aggregated data on your own, e.g. using SQL

from epstats.toolkit.testing import TestData

goals = TestData.load_goals_simple_agg()
goals

Out[2]:

	experiment	variant	views	clicks	conversions	bookings	bookings_squared
0	test-simple-metric	a	473661	48194	413	17152	803105
1	test-simple-metric	b	471485	47184	360	14503	677178

For continous metrics like RPM (RPM = bookings / views * 1000) it is necessary to prepare squared values - in this case we have columns bookings and bookings_squared.

Lets assume we have $K$ purchases. Hence the exact definition of columns bookings and bookings_squared is following

$$\text{bookings} = \sum_{i=1}^{K} \text{purchase_value}_{i}$$ $$\text{bookings_squared} = \sum_{i=1}^{K} (\text{purchase_value}_{i})^2$$

This is not necessary for binary metrics like Click-through Rate or Conversion Rate.

Experiment Definition and Evaluation¶

Firstly, you need to define metrics you want to evaluate. You can define as many metrics as you want. While creating the instance of the class SimpleMetric you need to specify parameters id, name, numerator and denominator. Further you can specify optional parameters metric_format, e.g. '${:,.1f}' for RPM, and parameter metric_value_multiplier, e.g. 1000 for RPM. The last optional parameter unit_type is preset only for technical reason. Be aware there can be used only one unit_type within one experiment. The value of this parameter has no impact on the evaluation.

Secondly, you can define checks as well by creating the instance of the class SimpleSrmCheck. It is not mandatory to define checks - keep it empty if you do not need one.

You wrap both metrics and checks definitions inside the Experiment definition. For more details see Experiment.

Finally, you evaluate the experiment calling method evaluate_wide_agg method, for details see Experiment.evaluate_wide_agg(). The results for metrics and checks are separated.

In [3]:

Copied!





from epstats.toolkit import Experiment, SimpleMetric, SimpleSrmCheck

unit_type='test_unit_type'  # this is only technical detail; it has no impact on the results

# Experiment Definition
experiment = Experiment(
    'test-simple-metric',
    'a',
    [
        SimpleMetric(1, 'Click-through Rate (CTR)', 'clicks', 'views', unit_type),
        SimpleMetric(2, 'Conversion Rate', 'conversions', 'views', unit_type),
        SimpleMetric(3, 'Revenue per Mille (RPM)', 'bookings', 'views', unit_type, metric_format='${:,.2f}', metric_value_multiplier=1000),
    ],
    [SimpleSrmCheck(1, 'SRM', 'views')],
    unit_type=unit_type)

# Experiment Evaluation
# `goals` is the DataFrame you have prepared on your own, e.g. using SQL
ev = experiment.evaluate_wide_agg(goals)

# Resluts
ev.checks
ev.metrics
from epstats.toolkit import Experiment, SimpleMetric, SimpleSrmCheck

unit_type='test_unit_type'  # this is only technical detail; it has no impact on the results

# Experiment Definition
experiment = Experiment(
    'test-simple-metric',
    'a',
    [
        SimpleMetric(1, 'Click-through Rate (CTR)', 'clicks', 'views', unit_type),
        SimpleMetric(2, 'Conversion Rate', 'conversions', 'views', unit_type),
        SimpleMetric(3, 'Revenue per Mille (RPM)', 'bookings', 'views', unit_type, metric_format='${:,.2f}', metric_value_multiplier=1000),
    ],
    [SimpleSrmCheck(1, 'SRM', 'views')],
    unit_type=unit_type)

# Experiment Evaluation
# `goals` is the DataFrame you have prepared on your own, e.g. using SQL
ev = experiment.evaluate_wide_agg(goals)

# Resluts
ev.checks
ev.metrics

Out[3]:

	timestamp	exp_id	metric_id	metric_name	exp_variant_id	count	mean	std	sum_value	confidence_level	diff	test_stat	p_value	confidence_interval	standard_error	degrees_of_freedom
0	1615928648	test-simple-metric	1	Click-through Rate (CTR)	a	473661	0.101748	0.302317	48194	0.95	0	0	1	0.0119665	0.00610546	947320
1	1615928648	test-simple-metric	1	Click-through Rate (CTR)	b	471485	0.100075	0.300101	47184	0.95	-0.0164385	-2.72161	0.00649657	0.0118382	0.00603998	945136
2	1615928648	test-simple-metric	2	Conversion Rate	a	473661	0.000871932	0.0295156	413	0.95	0	0	1	0.136333	0.0695586	947320
3	1615928648	test-simple-metric	2	Conversion Rate	b	471485	0.000763545	0.0276218	360	0.95	-0.124306	-1.96949	0.048897	0.123705	0.063116	941568
4	1615928648	test-simple-metric	3	Revenue per Mille (RPM)	a	473661	0.0362116	1.30162	17152	0.95	0	0	1	0.144766	0.0738616	947320
5	1615928648	test-simple-metric	3	Revenue per Mille (RPM)	b	471485	0.0307603	1.19805	14503	0.95	-0.15054	-2.29841	0.0215384	0.128373	0.0654974	939408

Formatting Results¶

You may find useful two methods for nice presentation of results - results_long_to_wide and format_results.

The former simply convert results from long format to wide one. The later then provide extra tuning. You can set number of decimals defining parameters format_pct and format_pval respectively.

In [4]:

Copied!

from epstats.toolkit.results import results_long_to_wide, format_results

ev.metrics.pipe(results_long_to_wide)
from epstats.toolkit.results import results_long_to_wide, format_results

ev.metrics.pipe(results_long_to_wide)

Out[4]:

	metric_name	Click-through Rate (CTR)					Conversion Rate					Revenue per Mille (RPM)
	statistic	mean	diff	conf_int_lower	conf_int_upper	p_value	mean	diff	conf_int_lower	conf_int_upper	p_value	mean	diff	conf_int_lower	conf_int_upper	p_value
exp_id	exp_variant_id
test-simple-metric	A	0.101748	0	-0.0119665	0.0119665	1	0.000871932	0	-0.136333	0.136333	1	0.0362116	0	-0.144766	0.144766	1
test-simple-metric	B	0.100075	-0.0164385	-0.0282766	-0.00460032	0.00649657	0.000763545	-0.124306	-0.248012	-0.000601163	0.048897	0.0307603	-0.15054	-0.278913	-0.0221675	0.0215384

In [5]:

Copied!

ev.metrics.pipe(results_long_to_wide).pipe(format_results, experiment, format_pct='{:.1%}', format_pval='{:.3f}')
ev.metrics.pipe(results_long_to_wide).pipe(format_results, experiment, format_pct='{:.1%}', format_pval='{:.3f}')

Out[5]:

	Metric	Click-through Rate (CTR)					Conversion Rate					Revenue per Mille (RPM)
	Statistics	Mean	Impact	Conf. interval lower bound	Conf. interval upper bound	p-value	Mean	Impact	Conf. interval lower bound	Conf. interval upper bound	p-value	Mean	Impact	Conf. interval lower bound	Conf. interval upper bound	p-value
Experiment Id	Variant
test-simple-metric	A	10.17%	0.0%	-1.2%	1.2%	1.000	0.09%	0.0%	-13.6%	13.6%	1.000	$36.21	0.0%	-14.5%	14.5%	1.000
test-simple-metric	B	10.01%	-1.6%	-2.8%	-0.5%	0.006	0.08%	-12.4%	-24.8%	-0.1%	0.049	$30.76	-15.1%	-27.9%	-2.2%	0.022