8 Flaws in A/B Split Testing

You have been using A/B split testing to improve your mail campaigns and web designs. The core idea is to randomly assign participants to group A or B and measure the resulting performance – usually in terms of conversion. Then perform statistical testing, either t-test (incorrect) or Chi-square test to see if the difference in performance between A and B is statistically significant at 95% confidence level.

There are  significant flaws with this approach:

  1. Large Samples: Use of large samples that are most likely to find statistical significance even for small differences. When using large samples (larger than 300) you lose segmentation differences.
  2. Focus on Statistical Significance: Every test tool, sample size calculator and articles are narrowly focused on achieving statistical significance, treating that as final word on the superiority of one version over.
  3. Ignoring Economic Significance: There may be statistical significance or not, but no test tool will tell you the economic significance of that for your decision making.
  4. Misleading Metrics: When tools report Version A is X% better than version B, they are simply wrong. The hypothesis testing used in A/B testing is simply one version is better than other and not by what percent.
  5. All or nothing: When the test results are inconclusive, there is nothing to learn from these tests.
  6. Discontinuous: There is no carryover of knowledge gained from previous tests. We do not apply any knowledge gained from a test in later tests.
  7. Test Everything and Test Often: The method wrests control from the decision maker in the name of “data driven”. This pushes one to suspend all prior knowledge (because these are considered hunches and intuition) and test every thing and test often, resulting in significant costs for minor improvements. Realize that the test tool makers are incentivized by your regular and excessive testing.
  8. Mistaking X implies Y is same as Y implies X: The hypothesis testing is flawed. What we test is, “how well does the data fit the hypothesis that we assumed”. But at the end of the test we state, “the hypothesis is supported by the data and is true for all future data”.

The root cause of all the mistakes is in using A/B testing for decision making. When you are deciding between two versions you are deciding which option will deliver you better returns. The uncertainty is in deciding the version. If there is no uncertainty at all, why bother?

The way to reduce uncertainty is to collect relevant information. It is profitable to do so only if the cost to collect this information is less than the expected increase in return from reducing the uncertainty.

You are not in the hypothesis testing business. You are in the business of adding value to your shareholders (that is you, your investors). To deliver value you need to make decisions in the presence of uncertainties.  With all its flaws, A/B testing is not the right solution for decision making!

So stop using A/B testing!

What do I recommend? Send me a note to read a preview of my article on “Iterative Bayesian (TM)”.