The A/B Test Is Inconclusive. Now What?

Required Prior Reading: To make the most out of this article you should read these

  1. Who makes the hypothesis in hypothesis testing?
  2. Solution to NYTimes coin toss puzzle

So you just did a A/B test between the old version A and a proposed new version B. Your results from 200 observations show version A received 90 and version B received 110. Data analysis says there is no statistically significant difference between the two versions. But you were convinced that version B is better (because of its UI design and your prior knowledge etc.).  So should you give up and not roll out version B? [tweetmeme source=”pricingright”]

With A/B testing, it is not enough to find that in your limited sampling, Version B performed better than Version A. The difference between the two has to be statistically significant at a preset confidence level . (See Hypothesis testing.)

It is not all for naught if you do not find statistically significant difference between your Version-A and Version-B. You can still move your analysis forward with some help from this 19th century pastor from England – Thomas Bayes.

Note: What follows below is a bit heavy on use of conditional probabilities. But if you hung in there, it is well worth it so you do not throw away a profitable version! You could move from being 60% certain to 90% certain that version B is the way to go!

Before I get to that let us start with statistics that form the basis of A/B test.

With A/B testing you are using Chi-square test to see if the observed frequency  difference between the two versions are statistically significant. A more detailed and an easy to read explanation can be found here.   You are in fact starting with two hypotheses:

The Null hypothesis H0: The difference between the versions are just random
Alternative hypothesis H1: The difference is significant such that one version performs better than the other.

You also choose (arbitrarily) a confidence level or p-value at which you want the results to be significant. The most common is 95% level or p=0.05. Based on your computed Chi-square value for that p value (lesser than corresponding value for 1 degree of freedom or greater ) you retain H0 or reject H0 and accept H1.

A/B test results are inconclusive. Now What?
Let us return to the original problem we started with. Your analysis does not have to end just because of this result, you can move it forward by incorporating your prior knowledge and with help from Thomas Bayes. Bayes theorem lets you find the likelihood your hypothesis will be true in the future given the observed data.
Suppose you were 6o% confident that version B will perform better. To repeat, we are not saying Version B will get 60%, we are stating that your prior knowledge says you are 60% certain version B should perform better (i.e., the difference is statistically significant). That represents your prior.

Then with your previously computed Chi-square value, instead of testing whether or not it is significant at p value 0.05, find for what p value the Chi-square value is significant and compute 1-p (Smith and Fletcher).

In the example I used, p is 0.15 and 1-p is 0.85.  According to Bayes, this is the likelihood that data fit the hypothesis given the hypothesis.

Then the likelihood your hypothesis will be correct in the future (posterior) is 90%
(.60 * .85)/(.40*.85+.3*.15)

From being 60% certain  that version B is better you have moved to being 90% certain that version B is the way to go. You can now decide to go with version B despite inconclusive A/B test.

If the use of  prior “60% certain that the difference is statistically significant” sounds like subjective, it is. That is why we are not stopping there and improving it with testing. It would help you to read my other article on hypothesis testing to understand that there is subjective information in both classical and Bayesian statistics. While with AB test we treat the probability of hypothesis ( that we set as) 1, in Bayesian we assign it a probability.

For the analytically inclined, here are the details behind this.

With your A/B testing you are assuming the hypothesis as given and finding the probability the data fits the hypothesis. That is conditional probability.

If  P(B) is the probability that version B performs better then P(B|H) is the conditional probability that B occurs given H. With Bayesian statistics you do not do hypothesis testing. You are find the conditional probability that given the data which hypothesis makes sense.  In this case it is P(H|B). This is what you are interested in to decide whether to go with version B or not

Bayes theorem says

P(H|B)  =  (Likelihood  *  Prior )/ P(B)
Likelihood = P(B|H) what we computed as 1 -p  (See Smith and Fltecher)
Prior = P(H)  – your prior knowledge, the 60% certain we used
P(B) = P(B|H)*P(H)+ P(B|H-))*(1-P(H))
P(B|H-) is the probability of B given hypothesis is false. In this model  it is p since we are using P(B|H) = 1-p

This is a hybrid approach, using classical statistics (the p value you found the  A/B test to be significant) and Bayesian statistics. This is a simplified version than just Bayesian statistics which is computationally intensive and too much for the task at hand for you.

The net is you are taking the A/B test a step forward despite its inconclusive results and are able to choose the version that is more likely to succeed.

What is that information worth to you?

References and Case Studies:

  1. Bayesian hypothesis testing for psychologists: A tutorial on the Savage–Dickey method. By: Wagenmakers, Eric-Jan; Lodewyckx, Tom; Kuriyal, Himanshu; Grasman, Raoul. Cognitive Psychology, May2010, Vol. 60 Issue 3, p158-189
  2. The Art & Science of Interpreting Market Research Evidence – Hardcover (May 5, 2004) by D. V. L. Smith and J. H. Fletcher
  3. Lilford R.  (2003) Reconciling the quantitative and qualitative traditions: the Bayesian approach . Public Money and Management vol 23, 5, pp 2730284
  4. Retzer J (2006) The century of Bayes. I J of Market Research vol 48, issue 1, pp 49-59