Implying Causation – Predictive Analytics Slippery Slope

Imagine, if you will, a child eating broccoli for the very first time. While eating broccoli, let us say the child sneezes a few times in succession and then proudly declares, “I think I am allergic to broccoli”. As a parent or simply as a grown-up it is not difficult for you to see the fallacy in child’s case. One does need an advanced degree in econometrics or statistics to  reply back, “eat your broccoli – correlation does not imply causation”.  Consider the following real cases:

  1. From The Times Economix Blog:
  2. There’s a very strong positive correlation between income and test scores. (For the math geeks out there, the R2 for each test average/income range chart is about 0.95.)

  3. From The WSJ opinion column:
  4. Study after study reveals that there are long-term career benefits to working as a teenager and that these benefits go well beyond the pay that these youths receive. A study by researchers at Stanford found that those who do not work as teenagers have lower long-term wages and employability even after 10 years.

  5. From WSJ half-page Ads targeting parents
  6. Students who read The Journal are 76% more likely to have a GPA of 36% or higher

  7. From a research paper on subscription to library resources by universities
  8. Working with Dr. Carol Tenopir of the University of Tennessee and consultant Judy Luther of Information Strategies, this single-case study demonstrates a $4.38 grant income for each $1.00 invested by the university in the library (ROI Value). The white paper External link University Investments in Information: What’s the Return? is posted on Library Connect. The results articulate the relationship between the value of research information and its impact on the funding of an institute.

  9. From a research paper from the London School of Economics
  10. In terms of percentage growth, a 7 point increase in word of mouth advocacy (net-promoter score)
    correlated with a 1% increase in growth (1 point increase = .147% more growth). The measurement was done through telephone survey in 2005 and the revenue growth numbers are for 2003-2004.

Can you spot the fallacies in these claims?  Are these seemingly erudite and well researched claims any different from the claims of a smart child that wants to avoid broccoli? Why do we want to see correlation when none exist or take correlation for causation? Why do we suspend our critical thinking when the results are presented by big brands, big universities and packed with tonnes of data and graphs?

Of all these cases I listed above, the last one is the winner. Suppose in the chronology of events,  event-2 follows event-1 in time. It is pardonable and a ubiquitous mistake when someone says event-1 might have caused event-2. This is the garden variety correlation causation confusion. But this example I quote says, “event-2 caused event-1”.

I do not know a word for this!

Correlation Causation Confusion

Here is a quote from today’s Journal’s opinion piece on minimum wage increase:

Study after study reveals that there are long-term career benefits to working as a teenager and that these benefits go well beyond the pay that these youths receive. A study by researchers at Stanford found that those who do not work as teenagers have lower long-term wages and employability even after 10 years.

What the study found is a correlation, but WSJ uses it to imply causation – not working as a teenager leads to getting lower long-term wages and employability. But isn’t is it possible that there is an underlying cause for both these observed characteristics (omitted variable bias)? Is it possible that the same reason that led to unemployment as a teenager is driving low-wages and employability in later years?

On the other hand, for those with high wages and employability is working as a teenager  just one tool? Would they have used any other means equally effectively to achieve what they want?

Correlation does not imply causation. The Stanford study was an observation, not a controlled experiment where they randomly selected teenagers, assigned them randomly to working and non-working groups and then years later look at their earning potential.

This is not the first time the Journal is pushing causation based on correlation. You can find more such causation confusion from WSJ here and here.