Knowing and Avoiding Selection Bias

This is a brief conversation I heard on Talk of the Nation, the program segment was on drowning deaths:

CONAN: Of the cases that you’ve studied, would you think that what percentage would think were – these were preventable deaths?

Dr. MODELL: Well, it’s hard to put a percentage because it’s a skewed series. The only ones in the series are people who died and then ended up in court, not the ones that were saved because the lifeguards or someone else did the proper things. So the numbers don’t mean that much

I should not be surprised but after reading so many blogs and even WSJ’s opinion pieces that are filled with selection, survivorship and other cognitive biases, I did not expect someone participating in a on-the-air conversation to notice and deftly avoid selection bias.

Here is an article, ” A Selection of Selection Anomalies” (PDF)

One such case discussed in the article was the work of Abraham Wald:

During WW-II, the military observed some planes were returning with bullet holes in some parts and wanted to reinforce those parts. Wald was asked to help with this project. Intuitively it would make sense to add more armor to the parts that got hit so often. But intuition here has selection bias, the military is only looking at planes that survived and landed safely despite getting hit in these areas. Wald reasoned planes that are getting hit in other parts most likely did not survive and hence recommended adding armor to the parts that did not have bullet holes.

It is not enough to look for evidence that support our notions, we need to look for evidence that will contradict it.

Why I stopped reading Antifragile at page 43

I am a big fan of Fooled by Randomness and The Black Swan. I however decided to give up on Taleb’s new book, Antifragile (which I borrowed from library) , after just 43rd page.

It is likely I am not able to appreciate the dense matter the author presents in this book. It is also possible, the author may mention other factors like choosing easy over hard. Or that I am happy to give up because I only borrowed the book from the library and didn’t buy it (although I would do the same because the cost of book is sunk).

It is also possible I was put-off by the offensive stance the author takes in defending his idea against the establishment and al those who according to him refuse to understand antifragile is the correct opposite of fragile (and ot robust).

I cannot also rule out that the theory of antifragile is supported only by the every data that is used to develop it.

But the real reason is what I read on page 43. Here are some snippets, my peeves with those and hence my decision to return the book.

They say best horses lose when they compete with slower ones and win against better rivals

Where is the data? How can a statement like “they say” find its place in a book proposing a new theory?

many do  better in Calculus 103 than Calculus 101

Isn’t there selection bias here? Isn’t it likely that very few move on to take Calculus 103 and hence are likely more interested in it?

If I find these problems in subjects I remotely understand what if I do not find other such bigger problems in subject areas I do not understand?

Hence my decision to punt on the book.

Fail fast because successful companies failed before they succeeded

There are several versions of this statement, one way or another they glorify failures and in the name of exhorting startup founders these inspirational statements lead one to believe

  1. After a few failures success is inevitable
  2. You must fail first to succeed
  3. Fail fast so you can succeed
  4. Failures signal impending success
  5. “Failure can be a true blessing in that it educates you and prepares you for success” (from here)
  6. “Remember that most successful entrepreneurs fail good and hard before they finally make it” (same source)

All these assertions are happy to point out popular examples. The problem is the assertions are derived from the very examples they are using as evidence.

First let us make something very clear. Success and Failure are the only two possible outcomes for any venture you undertake. But the fact that there are just two outcomes does not mean they are equally probable. It is not the case of tossing a fair coin and calculating the odds of heads or tails.The chances of success and failure can be and are very different. If you take the base rate (looking at the success rate of thousands of ventures and small businesses) the success rate is 3 to 5%.

Second  even if we assume that Success and Failure are equally likely, a series of failures does not mean inevitable success. Take the coin example. The probability of getting 10 Tails in a row is same as the probability of getting 9 Tails in a row followed by a Head.

Lastly the fact that those who succeeded had failed in the past is irrelevant. Those who make such an argument pick only the success stories that are popular, recent and available to them. When you only look at those who succeeded and are still in business you are leaving all those who did finally succeed and gave up or still trying without success. Even in these cited success stories success is mostly random rather than a result of their failures. The fact that those who succeeded had “failed hard” does not mean when you fail you will succeed.

Granted they learned from their mistakes but you do not have to learn from your own mistakes.  You do not have to fail to learn. Failure is not the true blessing. Insane success with hundreds of billions of valuation even when your venture has no real product or clear value add is true blessing.

Those who advise you to fail are not being intellectually honest. Their advices are no different from those advising a gambler to bet on a slot machine that had been coming up empty for the past few hours.

 

Let us get street education, who needs formal education …

There is considerable noise  being made to discourage kids from going to college, to be  “pirate”, to skip college education to start something or to follow their passion. The argument goes, “because people who started something big, Bill Gates, Steve Jobs and Mark Zuckerberg did not need college degree”.

Reporters like Sarah Lacy, instead of taking a critical look at such discussions are joining in.

http://twitter.com/#!/sarahcuda/status/58666464588201984

I believe most are aware of the meaningful counter argument made by Vivek Wadhwa and others. Instead of repeating that I will quote what Rev. Al Sharpton said in an interview with Stephen Colbert.

May be the reporters and the likes will see the cognitive biases in their case for asking kids to drop out of college.

The interview video link is here and it is unrelated to this noise about education bubble:

Colbert: You don’t have higher education. You got your education on the streets my friend, education in the church. Why can’t we give that to children and forget about the books?
Isn’t there sir, a tyranny in this country that everything gotta be out in a book? Why can’t we let these kids fly, be free?

Sharpton: (pointing at Colbert) See this is Exhibit A why we need education.
I know. I dropped out of college. I know myself everyday the regrets I have in not pursuing my degree.

Colbert: But you have done very well sir.

Sharpton: How many people that came out of the same neighborhood that I did, that had the same background I did was able to make without an education. Most of my friends ended up in jail or dead. I want to make sure that dosn’t happen to the next generation.

Rev. Al Sharpton gets selection bias and survivorship bias.

Do Peter Thiel and Sarah Lacy get  P(A|B) is not same as P(B|A)?

Do they know if tens of thousands of kids listen to them and skip college, how many will have a future like Gates, Jobs or Zuckerberg?

The Real Limitations of Evidence Based Marketing

This is not a reply to Seth Godin’s post. While his post triggered this, the usage of the term Marketing is different between the two articles. To me, Mr. Godin’s article’s use of marketing reads more about marketing communication and messaging. I cannot agree more on the need to present the message in more interesting and acceptable format than just gobs of facts and data. We cannot hope that the rational automatons will figure it out, years of behavioral economics research has shown that we are anything but rational.

Marketing, to me,  means strategic marketing – activities that come way before we get to messaging. Like choosing the segment to target, the product versions to deliver, the pricing strategy and how to reach these customers.  This article is about that definition of marketing. However, Mr. Godin’s use of the phrase “Evidence Based Marketing” for a messaging tactic will create confusion in the minds of many. Using data and facts in messaging is a tactic, it is not Evidence Based Management.
Read on.

Evidence based marketing is flawed, it is rife with multiple errors. The common error types are

Inherent Data Errors: Data is noisy and incomplete. Sometimes the readings can be just plain wrong due to probabilistic nature of data source.
Environmental Errors: Context and environment collude to further muddy the water by tricking our eyes, minds and tools. Due to no fault of our own we see mountains when there aren’t any (like John Ross’ Croker Mountains).
Observer Effect: Call it Heisenberg uncertainty principle or Hawthorne Effect. The very act of reading the data can taint it.
Methodological Errors: Call it sampling error or survey error, our methods of data collection are inadequate. They introduce their own errors that are sometimes unquantifiable.
Data Collection Errors: We do not know what or how to measure. We ask the wrong survey question or add the wrong stimulus  but treat the received data as answer to the question we had in mind.
Analysis Errors: Test for statistical significance is flawed and misused. Not just Type-I and II errors, the very method of hypothesis testing is flawed. We collect data conditioned on the hypothesis being true and declare, “since data fit the hypothesis we conditioned to be true, the hypothesis is true”. We take comfort, wrongly, in using lower p-values. When data fit the hypothesis we stop, ignoring the fact that data can fit any number of hypotheses.
More Analysis Errors: We take comfort in adopting more rigorous and computationally intensive analytical methods. We treat correlation as causation and even regression as causation, look to the past to guide the future. We ignore lurking variables that could explain common causation.
Yet, Evidence Based Marketing is preferable to the alternative – based on guts, fads, Guru’s revelation, opinion of the person with highest title, and just plain observations.

The alternative is rife with errors that are not easy to see or measure – Cognitive errors. We suffer from multiple cognitive biases like Recency Effect, Availability Bias, Selection Bias and Survivorship Bias.Worse, the alternative rely on Commitment Bias and Agreement Bias (peer pressure) to push through a version of truth. Here the truths are unfalsifiable and it is forbidden to look for falsifiable evidence. Anyone stepping outside the accepted norm is labeled and shamed into compliance.

Evidence Based Marketing is flawed but it is clear and explicit about the flaws and lets the decision maker be the judge. The good practitioners know its flaws and bad ones cannot overlook it for too long.

Evidence Based Marketing eliminates the need for a charismatic leader and survives across leadership changes. The Alternative relies on the words and presence of the leader and the methods change when leadership changes. One myth is replaced by another.Both Evidence Based Marketing and its Alternative will take us to Madagascar but will try to tell us it is actually San Diego. Evidence Based Marketing would do that with data selection error, Alternative will just tell with authority.

When  uncovering new data shows otherwise, Evidence Based Marketing will reduce its certainty and state the likelihood that its prediction is correct.

Evidence Based Marketing is about forming a repeatable and falsifiable theory. The practitioners know that their is theory is true only because there isn’t data to prove it otherwise.

Which pill is it going to be for you?