Types of statistical Errors

Type I and type II errors

Error that is rejecting null hypothesis
- Rolling out A/B tests that is performing worse than the baseline
Error that is confirming wrong null hypothesis
- Rolling back A/B tests that were significantly better than the baseline

A/B Testing

Some see this version (A)
Others see this version (B)

Measurements of...

Samples (Users, Sessions, Impressions)
Conversions (Number of clicks or goal competition

Why do we need testing instead of direct comparison

If number of sample is not enough, the result isn't statistically significant enough
- Flipping a coin 10 times may lead to 6 heads and 4 tails, or maybe 7 tails and 3 heads.
- We need better decision making framework

Bayesian A/B Testing

Bayesian A/B Testing Calculator

Use bayesian theory
- Why?
  - To test problems when there's relatively low amount of data
- What is different?
  - We set the prior probability distribution (i.e, uniform distribution)

Null hypothesis significance testing

Validate the following hypothesis
- two or more groups have basically same distribution (ex: uniform distribution for example)