Most of my works on pricing and consumer behavior studies rely on hypothesis testing. Be it finding difference in means between two groups, non-parametric test or making a causation claim, explicitly or implicitly I apply hypothesis testing. I make overarching claims about customer willingness to pay and what factors influence it based on hypothesis testing. The same is true for the most popular topic, these days, for anyone with a web page – AB split testing. Nothing wrong with these methods and I bet I will continue to use these methods in all my other works.
We should note however that the use of hypothesis and finding statistically significant difference should not blind us to the fact that there is some amount of subjectivity that go into all these. Another important distinction to note is, despite the name hypothesis testing we are not testing whether the hypothesis is validated but whether the data fits the hypothesis which we take it as given. More on this below.
All these testings proceed as follows:
- Start with the hypothesis. In fact you always start with two, the null hypothesis which is the same for any statistical testing
The Null hypothesis H0: The observed difference between subjects (or groups) is just due to randomness.
Then you write down the hypothesis that you want to make a call on.
Alternative hypothesis H1: The observed difference between subjects (or groups) is indeed due to one or more treatment factors that you control for.
- Pick the statistical test you want to use among those available given your case. Be it a non-parametric test like Chi-square that makes no assumption about the distribution of data (AB testing) or parametric test like t-test that assumes Gaussian distribution (e.g., normal) of data.
- Select a critical value or confidence level for the test 90%,95%, 99% with 95% being the most common. This is completely subjective. What you are stating with the critical value is the results are statistically significant only if these can be caused due to randomness in less than 5% (100-95%) of the cases. The critical value is also expressed as p value ( probability ), in this case 0.05.
- Perform the test with random sampling. This needs more explanation but is beyond the scope of what I want to cover here.
As you can see, we the analyst/decision maker make up the hypothesis and we are treating the hypothesis as given. We did the right thing of writing it first. ( A common mistake in many of the AB tests and in data mining exercises is writing the hypothesis after the test.)
What we are testing is, given this hypothesis H1 is true (P(H1)=1) what is the probability the test data D fits the hypothesis.
This is expressed as P(D|H1). Statistical significance here means P(D|H1) > 0.95 given P(H1) =1.
When we say we accept H1, we are really saying H0 (randomness) cannot be the reason and hence H1 must be true. We rule out the fact that the observed data can be explained by any number of alternative hypotheses. Since we wrote the original hypothesis, if we did not base it on proper qualitative analysis then we could be wrong despite the fact our tests yields statistically significant results.
This is why you should never launch a survey without doing focus groups and customer interviews. This is why you don’t jump into statistical testing before understanding enough about the subjects under study to frame relevant hypothesis. Otherwise you are, as some wrote to me, using gut feel or pulling things out of thin air and accepting it simply because there is not enough evidence in the data to overturn the null hypothesis.
How do you come up with your hypotheses?
Look for my next article on how this is different in Bayesian statistics.