In A/B testing, you control for many factors and test only one hypothesis – be it two different calls to action or two different colors for BuyNow buttons. When you find statistically significant difference in conversion rates between the two groups, you declare one version is superior to other.
Hidden in this hypothesis testing are many implicit hypotheses that we treat as truth. If any one of them prove to be not true then our conclusion from the A/B testing will be wrong.
Dave Rekuc, who runs an eCommerce site, posed a question in Avinash Kaushik’s blog post on test for statistical significance and A/B testing. Dave’s question surfaces the very issue of one such hidden hypothesis
I work for an ecommerce site that has a price range of anywhere from $3 an item to $299 an item. So, I feel like in some situations only looking at conversion rate is looking at 1 piece of the puzzle.
I’ve often used sales/session or tried to factor in AOV when looking at conversion, but I’ve had a lot of trouble coming up with a statistical method to ensure my tests’ relevance. I can check to see if both conversion and AOV pass a null hypothesis test, but in the case that they both do, I’m back at square one.
Dave’s question is, whether the result from the conversion test experiment hold true across all price ranges.
He is correct in stating that looking at conversion rate alone is looking at one part of the puzzle.
When items vary in price, like he said from $3 to $299, the test for statistical significance of difference between conversion rates assumes an implicit hypothesis that is treated as truth.
A1: The difference in conversion rates does not differ across price ranges.
and the null hypothesis (same, just added for completeness)
H0: Any difference between the conversion rates is due to randomness
When your data tells you that H0 can or cannot be rejected, it is conditioned on the implicit assumption A1 being true.
But what if A1 is false? In Dave’s case he uncovered one. What about many other such hypotheses? Other examples include, treating the target population as the same (no male/female difference, no Geo specific difference etc) and products as the same.
I point out to two different results from the same data set by segmenting and not segmenting the population in one of my previous posts.
That is the peril of hidden hypotheses.
What is the solution for a situation like Dave’s? Either you explicitly test this assumption first or as simpler option, segment your data and test each segment for statistical significance. Since you have a range of price points I recommend you test over 4-5 price ranges.
What is the solution for the bigger problem of many different hidden hypotheses?
Talk to me.