Using Ep-Stats from Python¶
We can call ep-stats as regular python package to evaluate any experiment from any data. We can define arbitrary goals and metrics if we are able to select goals from our primary data store.
Make sure please to read and understand basic Principles of EP before using this notebook.
Evaluate¶
We define experiment with one Click-through Rate metric to evaluate.
We load testing pre-aggregated goals data using TestData.load_goals_agg.
See Experiment.evaluate_agg for details.
from epstats.toolkit import Experiment, Metric, SrmCheck
experiment = Experiment(
'test-conversion',
'a',
[Metric(
1,
'Click-through Rate',
'count(test_unit_type.unit.click)',
'count(test_unit_type.global.exposure)'),
],
[SrmCheck(1, 'SRM', 'count(test_unit_type.global.exposure)')],
unit_type='test_unit_type')
# This gets testing data, use other Dao or get aggregated goals in some other way.
from epstats.toolkit.testing import TestData
goals = TestData.load_goals_agg(experiment.id)
# evaluate experiment
ev = experiment.evaluate_agg(goals)
Number of exposures per variant.
ev.exposures
| exp_variant_id | exposures | exp_id | |
|---|---|---|---|
| 0 | a | 21.0 | test-conversion |
| 1 | b | 26.0 | test-conversion |
| 2 | c | 30.0 | test-conversion |
Metrics evaluations, see Evaluation.metric_columns for column value meanings.
ev.metrics
| timestamp | exp_id | metric_id | metric_name | exp_variant_id | count | mean | std | sum_value | confidence_level | diff | test_stat | p_value | confidence_interval | standard_error | degrees_of_freedom | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1607977256 | test-conversion | 1 | Click-through Rate | a | 21 | 0.238095 | 0.436436 | 5 | 0.95 | 0 | 0 | 1 | 1.14329 | 0.565685 | 40 |
| 1 | 1607977256 | test-conversion | 1 | Click-through Rate | b | 26 | 0.269231 | 0.452344 | 7 | 0.95 | 0.130769 | 0.223152 | 1 | 1.23275 | 0.586008 | 43.5401 |
| 2 | 1607977256 | test-conversion | 1 | Click-through Rate | c | 30 | 0.3 | 0.466092 | 9 | 0.95 | 0.26 | 0.420806 | 1 | 1.35281 | 0.617862 | 44.9314 |
SRM check results, p-value < 0.001 signals problem in experiment randomization. See Sample Ratio Mismatch Check for details.
ev.checks
| timestamp | exp_id | check_id | check_name | variable_id | value | |
|---|---|---|---|---|---|---|
| 0 | 1607977256 | test-conversion | 1 | SRM | p_value | 0.452844 |
| 1 | 1607977256 | test-conversion | 1 | SRM | test_stat | 1.584416 |
| 2 | 1607977256 | test-conversion | 1 | SRM | confidence_level | 0.999000 |
How to Prepare Goals Dataframe¶
You have to prepare the goals input dataframe from your data and follow description at either Experiment.evaluate_agg or Experiment.evaluate_by_unit.
The goals dataframe must contain data to evaluate all metrics. Per-user metrics require that you first group by including some experiment randomization unit id (unit_id) column to get correct value for sum_sqr_count and sum_sqr_value, then you group by without it to get pre-aggregated data.
This is an example of goals dataframe used to evaluate experiment test-conversion above.
goals['date'] = '2020-08-01'
goals['count_unique'] = goals['count']
goals
| exp_id | date | exp_variant_id | unit_type | agg_type | goal | dimension | dimension_value | element | count | sum_sqr_count | sum_value | sum_sqr_value | count_unique | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | test-conversion | 2020-08-01 | a | test_unit_type | unit | click | NaN | 5 | 5 | 5 | 5 | 5 | ||
| 1 | test-conversion | 2020-08-01 | b | test_unit_type | unit | click | NaN | 7 | 7 | 7 | 7 | 7 | ||
| 2 | test-conversion | 2020-08-01 | c | test_unit_type | unit | click | NaN | 9 | 9 | 9 | 9 | 9 | ||
| 3 | test-conversion | 2020-08-01 | a | test_unit_type | global | exposure | NaN | 21 | 21 | 21 | 21 | 21 | ||
| 4 | test-conversion | 2020-08-01 | b | test_unit_type | global | exposure | NaN | 26 | 26 | 26 | 26 | 26 | ||
| 5 | test-conversion | 2020-08-01 | c | test_unit_type | global | exposure | NaN | 30 | 30 | 30 | 30 | 30 |
Following SQL pseudo code shows how we first aggregate data per experiment unit id (to get aggregates per-user) and then how we aggregate without unit id to get pre-aggregated goals dataframe.
"""
SELECT
exp_id,
exp_variant_id,
unit_type,
agg_type,
goal,
dimension,
dimension_value,
SUM(sum_cnt) count,
SUM(sum_cnt * sum_cnt) sum_sqr_count,
SUM(value) sum_value,
SUM(value * value) sum_sqr_value,
CAST(SUM(unique) AS Int64) count_unique
FROM (
SELECT
exp_id,
exp_variant_id,
unit_type,
agg_type,
goal,
dimension,
dimension_value,
unit_id,
SUM(cnt) sum_cnt,
SUM(value) value,
IF(SUM(cnt) > 0, 1, 0) unique
FROM events.table
GROUP BY
exp_id,
exp_variant_id,
unit_type,
agg_type,
goal,
dimension,
dimension_value,
unit_id
) u
GROUP BY
exp_id,
exp_variant_id,
unit_type,
agg_type,
goal,
dimension,
dimension_value
""";