Using Ep-Stats from Python¶
We can call ep-stats as regular python package to evaluate any experiment from any data. We can define arbitrary goals and metrics if we are able to select goals from our primary data store.
Make sure please to read and understand basic Principles of EP before using this notebook.
Evaluate¶
We define experiment with one Click-through Rate metric to evaluate.
We load testing pre-aggregated goals data using TestData.load_goals_agg
.
See Experiment.evaluate_agg
for details.
from epstats.toolkit import Experiment, Metric, SrmCheck
experiment = Experiment(
'test-conversion',
'a',
[Metric(
1,
'Click-through Rate',
'count(test_unit_type.unit.click)',
'count(test_unit_type.global.exposure)'),
],
[SrmCheck(1, 'SRM', 'count(test_unit_type.global.exposure)')],
unit_type='test_unit_type')
# This gets testing data, use other Dao or get aggregated goals in some other way.
from epstats.toolkit.testing import TestData
goals = TestData.load_goals_agg(experiment.id)
# evaluate experiment
ev = experiment.evaluate_agg(goals)
Number of exposures per variant.
ev.exposures
exp_variant_id | exposures | exp_id | |
---|---|---|---|
0 | a | 21.0 | test-conversion |
1 | b | 26.0 | test-conversion |
2 | c | 30.0 | test-conversion |
Metrics evaluations, see Evaluation.metric_columns
for column value meanings.
ev.metrics
timestamp | exp_id | metric_id | metric_name | exp_variant_id | count | mean | std | sum_value | confidence_level | diff | test_stat | p_value | confidence_interval | standard_error | degrees_of_freedom | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1607977256 | test-conversion | 1 | Click-through Rate | a | 21 | 0.238095 | 0.436436 | 5 | 0.95 | 0 | 0 | 1 | 1.14329 | 0.565685 | 40 |
1 | 1607977256 | test-conversion | 1 | Click-through Rate | b | 26 | 0.269231 | 0.452344 | 7 | 0.95 | 0.130769 | 0.223152 | 1 | 1.23275 | 0.586008 | 43.5401 |
2 | 1607977256 | test-conversion | 1 | Click-through Rate | c | 30 | 0.3 | 0.466092 | 9 | 0.95 | 0.26 | 0.420806 | 1 | 1.35281 | 0.617862 | 44.9314 |
SRM check results, p-value < 0.001 signals problem in experiment randomization. See Sample Ratio Mismatch Check for details.
ev.checks
timestamp | exp_id | check_id | check_name | variable_id | value | |
---|---|---|---|---|---|---|
0 | 1607977256 | test-conversion | 1 | SRM | p_value | 0.452844 |
1 | 1607977256 | test-conversion | 1 | SRM | test_stat | 1.584416 |
2 | 1607977256 | test-conversion | 1 | SRM | confidence_level | 0.999000 |
How to Prepare Goals Dataframe¶
You have to prepare the goals input dataframe from your data and follow description at either Experiment.evaluate_agg
or Experiment.evaluate_by_unit
.
The goals dataframe must contain data to evaluate all metrics. Per-user metrics require that you first group by including some experiment randomization unit id (unit_id
) column to get correct value for sum_sqr_count
and sum_sqr_value
, then you group by without it to get pre-aggregated data.
This is an example of goals dataframe used to evaluate experiment test-conversion
above.
goals['date'] = '2020-08-01'
goals['count_unique'] = goals['count']
goals
exp_id | date | exp_variant_id | unit_type | agg_type | goal | dimension | dimension_value | element | count | sum_sqr_count | sum_value | sum_sqr_value | count_unique | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | test-conversion | 2020-08-01 | a | test_unit_type | unit | click | NaN | 5 | 5 | 5 | 5 | 5 | ||
1 | test-conversion | 2020-08-01 | b | test_unit_type | unit | click | NaN | 7 | 7 | 7 | 7 | 7 | ||
2 | test-conversion | 2020-08-01 | c | test_unit_type | unit | click | NaN | 9 | 9 | 9 | 9 | 9 | ||
3 | test-conversion | 2020-08-01 | a | test_unit_type | global | exposure | NaN | 21 | 21 | 21 | 21 | 21 | ||
4 | test-conversion | 2020-08-01 | b | test_unit_type | global | exposure | NaN | 26 | 26 | 26 | 26 | 26 | ||
5 | test-conversion | 2020-08-01 | c | test_unit_type | global | exposure | NaN | 30 | 30 | 30 | 30 | 30 |
Following SQL pseudo code shows how we first aggregate data per experiment unit id (to get aggregates per-user) and then how we aggregate without unit id to get pre-aggregated goals dataframe.
"""
SELECT
exp_id,
exp_variant_id,
unit_type,
agg_type,
goal,
dimension,
dimension_value,
SUM(sum_cnt) count,
SUM(sum_cnt * sum_cnt) sum_sqr_count,
SUM(value) sum_value,
SUM(value * value) sum_sqr_value,
CAST(SUM(unique) AS Int64) count_unique
FROM (
SELECT
exp_id,
exp_variant_id,
unit_type,
agg_type,
goal,
dimension,
dimension_value,
unit_id,
SUM(cnt) sum_cnt,
SUM(value) value,
IF(SUM(cnt) > 0, 1, 0) unique
FROM events.table
GROUP BY
exp_id,
exp_variant_id,
unit_type,
agg_type,
goal,
dimension,
dimension_value,
unit_id
) u
GROUP BY
exp_id,
exp_variant_id,
unit_type,
agg_type,
goal,
dimension,
dimension_value
""";