Knowing and Avoiding Selection Bias

This is a brief conversation I heard on Talk of the Nation, the program segment was on drowning deaths:

CONAN: Of the cases that you’ve studied, would you think that what percentage would think were – these were preventable deaths?

Dr. MODELL: Well, it’s hard to put a percentage because it’s a skewed series. The only ones in the series are people who died and then ended up in court, not the ones that were saved because the lifeguards or someone else did the proper things. So the numbers don’t mean that much

I should not be surprised but after reading so many blogs and even WSJ’s opinion pieces that are filled with selection, survivorship and other cognitive biases, I did not expect someone participating in a on-the-air conversation to notice and deftly avoid selection bias.

Here is an article, ” A Selection of Selection Anomalies” (PDF)

One such case discussed in the article was the work of Abraham Wald:

During WW-II, the military observed some planes were returning with bullet holes in some parts and wanted to reinforce those parts. Wald was asked to help with this project. Intuitively it would make sense to add more armor to the parts that got hit so often. But intuition here has selection bias, the military is only looking at planes that survived and landed safely despite getting hit in these areas. Wald reasoned planes that are getting hit in other parts most likely did not survive and hence recommended adding armor to the parts that did not have bullet holes.

It is not enough to look for evidence that support our notions, we need to look for evidence that will contradict it.

Can you add my one question to your survey?

No sooner you let it be known, mostly inadvertently, that you are about to send out a survey to customers than starts incessant requests (and commands) from your co-workers (and bosses) to add just one more question to it. Just one more question they have been dying to find the answer for but have not gotten around to do a survey or anything else to find the answer for.

Just one question right? What harm can it do? Sure you are not opening the floodgates and adding everyone’s question, just one question to satisfy the HiPPO?

May be I am unfair to all our colleagues. It is possible it is not them asking to add one more question, it is usually us who is tempted to add just one more question to the survey we are about to send out. If survey takers are already answering a few it can’t be that bad for them to answer one more?

The answer is yes of course it can be really bad. Resist any arm-twisting, bribing and your own temptation to add that one extra question to a carefully constructed survey. That is I am assuming you did carefully construct the survey, if not sure add them all, the answers are meaningless and in-actionable anyways.

To define what carefully constructed survey means we need to ask, “What decision are you trying to make with the data you will collect?”.

survey-processIf you do not have decisions to make, if you won’t do anything different based on the data collected or if you are committed to do whatever you are doing now and only collecting data to satisfy the itch then you are doing it absolutely wrong. And in that case yes please add that extra question from your boss for some brownie points.

So you do have decisions to make and made sure the data you seek is not available through any other channels. Then you need to develop a few hypotheses about the decision. You do that by doing the background exploratory research including customer one-on-one interviews, social media search analysis and if possible focus groups. Yes we are actually paid to make better hypothesis so you should take this step seriously.

For example your decision is how to price a software offering and your hypotheses is about value perception of certain key features and consumption models.

Once you develop a minimal set of well defined hypotheses to test, you design the survey to collect data to test those hypotheses.  Every question in your survey must serve to test one or more of the hypotheses. On the flip side you may not be able to test all your hypotheses in one survey and that is okay. But if there is a question that does not serve to test any of the hypotheses then it does not belong in that survey.Slide2

The last step is deciding the relevant target mailing list you want to send this survey to. After all there is no point is asking the right questions to wrong people.

Now you can see what adding that one extra question from your colleague does to your survey. It did not come from your decision process, does not help with your hypotheses, and most likely not relevant to the sample set you are using.