Yet Another Causation Confusion – School Ratings and Home Forclosures

Here is the headline from WSJ,

One Antidote to Foreclosures: Good Schools

The article is based on  data mining by Location Inc, that found that

over past six months percentage of foreclosure (or “real-estate-owned”) sales went down as the school ranking went up in five metro areas

Next we see a news story attributing clear causation. This one is hard to notice for some as it appears to be longitudinal – reported over six month period. If it were a simpler cross-sectional study most will catch the fallacy right away.

First let me point out this is the problem with data mining – digging for nuggets in mountains of Big Data without an initial hypothesis and finding such causations.

Second school rating improvement could be due to random factors that coincide with lower foreclosure.

Third despite the fact that longitudinal aspect implies causation there are many omitted variables here – the common factors that are driving down foreclosure and driving up school rating.

School rating is not an independent metric. It relies not just on teacher performance  but also on parents. The same people who are willing to work with their kids are also likely to be fiscally responsible. Another controversial but a proven factor is the effect of genes on children’s performance.

Ignoring all these if we focus our resources on improving school ratings to solve foreclosure crisis, we will be chasing away the wolves that cause eclipses with loud noises.

Butler’s Nored Gets Omitted Variable Bias

As of yesterday afternoon Ronald Nored from Butler scored less than 9 points in the NCAA tournament. He is not a starter, he comes of the bench as sixth player. Nored’s jersey number is 5.

Nored may not have impressive statistics but his jersey is not.  NPR asked Nored about this,

Mike Pesca: There are more number five jerseys in the student section than everyone else combined, which is kind of weird since you’re a sixth man. Why do you think that is?

Ronald Nored: I have no – that’s a great question. I have no idea. My guess is probably just because it’s a white jersey, and the white jersey looks good. I have no idea.

Mike Pesca: Yes, Nored says that the number five was available in white, and that’s the reason the students love him. You’d expect that kind of deflection from a guy who leads the team in steals per game.

Next time you read a management fad, expounded by a popular Guru, that attributes success of businesses to certain externally visible positive traits, think like Nored. What other common reason could be causing both the business success and the traits?

Sufficient but not Necessary!

The traditional media and the social media are peppered with stories on how one can achieve success like other successful entities.  Examples include, 7 habits, Good to Great, and numerous blog articles that follow the similar pattern.  Almost all of these articles look at a successful business or a person and look for observable positive traits . Then they attribute the success to the presence of such positive traits.

The general arguments against such studies include:

  1. Treating correlation as causation
  2. Different biases (survivorship, selection, availability, hindsight)
  3. Methodology errors like omitted variable bias
  4. We can’t stop because the data fit an hypothesis, data can fit any number of hypotheses.

Even if we set all these flaws aside and accept that indeed the success was the direct result of the positive traits there is another problem. These traits may be sufficient to the success but are they necessary?

Take an extreme example (for illustration). Let us say you observe a tall person in a fruit orchard. You observe her effortlessly pick much more fruits than others thanks to her height which gives her access to more opportunities. Her height was sufficient to get more fruits, but was it necessary?

Next time you see articles on “6/7/8/9 ways to do marketing/product-launches like Apple/Google/twitter/GratefulDead”, even if you look past the biases you should ask if the methods are relevant to your situation and are indeed necessary for your success.

Predictive Power of Customer Metrics

[tweetmeme source="pricingright"]The usefulness of any customer metric depends on how actionable and how good a predictor of business success it is. Let us define here that business success refers to Sales growth and profitability.

  1. Of all the  metrics out there, is there one that serves as a good predictor of sales growth and profitability?
  2. Can there be really a single metric?
  3. What do you, as a small business owner, an entrepreneur or a decision maker for large enterprise need to know about the single metric trap?
  4. What other factors you should be aware of?

Read on.

Let us start with most common customer metrics, including but not limited to

  1. ACSI – Average Customer Satisfaction Index
  2. Top-2 Box (on a 5 point scale) Customer Satisfaction Score
  3. Number of recommendations – WoM, number of customers who actively recommend your product (service)
  4. Proportion of your customers complaining
  5. The Net Promoter Score

Supporters of some of the metrics claim theirs is the only metric any business need to track. In the data cited we will find a high positive correlation between these metrics and the two measures of business success. You do not need an advanced degree in statistics to question, “Does this correlation mean causation?”. But it does get a little tricky to sift through the data and flaws in analysis of the case for a single metric that predicts business success.

The biggest flaw that can occur in any argument that a single variable alone has predictive power is Omitted Variable Bias. Is there a lurking variable that was omitted in the model that drove both the metric and business success? This is not to say every argument that extends one predictor has Omitted Variable Bias but to raise the possibility that there may exist another variable that may explain the changes in your dependent variable.

Let me use an example to explain what it is before using it explain single metric trap.

This comes from Greg Mankiw. Suppose studies found a high correlation between test score of children and the number of bathrooms in their homes. Is this causation? Is this the single metric that determines success in tests? No. As Mankiw explains, the Omitted Variable here is the IQ of parents. It is possible that parents with high IQ earn high income and hence have large houses with more bathrooms. Their children may have high IQ because of the good genes passed on by their parents.

In the case of customer metrics, what could be the Omitted Variables? Some could be nature of products, your marketing strategy, channel strategy, nature of competition, etc. The question worth asking is,  Is the metric at hand with high correlation same as the number of bathrooms at homes? Let us take the third metric above, Number of Recommendations, as an example just for illustrative purposes. Is it possible that the nature of customers you are targeting have a high propensity to recommend? If you did not consider this possibility then you will incorrectly align all your resources and actions towards improving number of recommendations without any impact on business goals.

That would result in house full of bathrooms but still poor test scores.

I am not recommending that you give up on all metrics but  urge you to understand Omitted Variable Bias and consider the perils of tracking just one variable.

  1. What are all the different factors that are relevant to the business you are in and to your customers?
  2. How do these factors influence the single customer metric and your business success?
  3. After accounting for all these other variables, what percentage of changes in sales growth and profitability can be explained by the changes in that single customer metric you track?

In evidence based management any metric must be questioned for its predictive power and the methods by which the results are arrived at. Simplicity of a metric alone must not be the criteria.

Write to me, I will be happy to break this down more.


For a very readable and clear discussion of Omitted Variable Bias see also this post.

Correlation Causation Confusion

Here is a quote from today’s Journal’s opinion piece on minimum wage increase:

Study after study reveals that there are long-term career benefits to working as a teenager and that these benefits go well beyond the pay that these youths receive. A study by researchers at Stanford found that those who do not work as teenagers have lower long-term wages and employability even after 10 years.

What the study found is a correlation, but WSJ uses it to imply causation – not working as a teenager leads to getting lower long-term wages and employability. But isn’t is it possible that there is an underlying cause for both these observed characteristics (omitted variable bias)? Is it possible that the same reason that led to unemployment as a teenager is driving low-wages and employability in later years?

On the other hand, for those with high wages and employability is working as a teenager  just one tool? Would they have used any other means equally effectively to achieve what they want?

Correlation does not imply causation. The Stanford study was an observation, not a controlled experiment where they randomly selected teenagers, assigned them randomly to working and non-working groups and then years later look at their earning potential.

This is not the first time the Journal is pushing causation based on correlation. You can find more such causation confusion from WSJ here and here.

People Who Read WSJ Are 75% More Likely To …

Does reading The Wall Street Journal makes one more likely to get better jobs and bigger salaries?

The Saturday edition had a half page Ad for student subscription. You can find the claims made in that Ad here.


The problem with these claims is correlation does not imply causation. Regarding these claims:

  1. This is a survey, not a controlled experiment where they randomly assigned people to a control group and treatment group and followed them over years to see if there are statistically significant differences in their GPA, salary etc.
  2. There is omitted variable bias here. The same trait that made the students and others read the WSJ is possibly the driver behind their success. Self motivated and driven people are going to equip themselves with every possible tool and training to get ahead in life. If it is not WSJ they would have read other journals and newspapers to get ahead.  While the claim that “Journal helps the student get ahead with a robust set of career preparation resources”  is valid the following statement “Did you know students who read The Journal are 140% more likely to be starting a full-time job upon graduation?” is misleading because it implies causation.

Few  years back there was a TV commercial for WSJ that showed a man going up in career because of WSJ. The commercial starts with a man, walking in rain, stopping to pick a copy of WSJ from a news vendor to protect himself from the rain. He later runs into an executive of his company in the elevator, who upon seeing the newspaper in his hand offers him instant promotion.  It is one thing to use humor to imply causation, no one will take it seriously. It is however not factually correct when they use survey data and make a causation claim based on correlation.

Other reads: There was also an article on Fantasy Football that implies causation from correlation.