Implying Causation – Predictive Analytics Slippery Slope

Imagine, if you will, a child eating broccoli for the very first time. While eating broccoli, let us say the child sneezes a few times in succession and then proudly declares, “I think I am allergic to broccoli”. As a parent or simply as a grown-up it is not difficult for you to see the fallacy in child’s case. One does need an advanced degree in econometrics or statistics to  reply back, “eat your broccoli – correlation does not imply causation”.  Consider the following real cases:

  1. From The Times Economix Blog:
  2. There’s a very strong positive correlation between income and test scores. (For the math geeks out there, the R2 for each test average/income range chart is about 0.95.)

  3. From The WSJ opinion column:
  4. Study after study reveals that there are long-term career benefits to working as a teenager and that these benefits go well beyond the pay that these youths receive. A study by researchers at Stanford found that those who do not work as teenagers have lower long-term wages and employability even after 10 years.

  5. From WSJ half-page Ads targeting parents
  6. Students who read The Journal are 76% more likely to have a GPA of 36% or higher

  7. From a research paper on subscription to library resources by universities
  8. Working with Dr. Carol Tenopir of the University of Tennessee and consultant Judy Luther of Information Strategies, this single-case study demonstrates a $4.38 grant income for each $1.00 invested by the university in the library (ROI Value). The white paper External link University Investments in Information: What’s the Return? is posted on Library Connect. The results articulate the relationship between the value of research information and its impact on the funding of an institute.

  9. From a research paper from the London School of Economics
  10. In terms of percentage growth, a 7 point increase in word of mouth advocacy (net-promoter score)
    correlated with a 1% increase in growth (1 point increase = .147% more growth). The measurement was done through telephone survey in 2005 and the revenue growth numbers are for 2003-2004.

Can you spot the fallacies in these claims?  Are these seemingly erudite and well researched claims any different from the claims of a smart child that wants to avoid broccoli? Why do we want to see correlation when none exist or take correlation for causation? Why do we suspend our critical thinking when the results are presented by big brands, big universities and packed with tonnes of data and graphs?

Of all these cases I listed above, the last one is the winner. Suppose in the chronology of events,  event-2 follows event-1 in time. It is pardonable and a ubiquitous mistake when someone says event-1 might have caused event-2. This is the garden variety correlation causation confusion. But this example I quote says, “event-2 caused event-1”.

I do not know a word for this!