This regression model is beautiful and correctly used

Yes, the X and Y are switched. But think about it, it is just convention

It appears there is new found craze for regression analysis in tech blogs. Ok may be not, there was just this article and then another in Techcrunch. If you really want to understand and admire a fantastic use of regression analysis you should read this article by WSJ’s Number’s Guy, Carl Bialik.

Carl set out to find why the San Diego chargers had yet another bad 2011 season, winning fewer games than their talent and scoring would otherwise suggest,

“What’s frustrating about San Diego’s poor starts in recent seasons isn’t just that the team appears to have had too much talent to lose so many early games. It’s that the Chargers also outscore opponents by too much to explain their relatively poor record.”

Carl’s hypothesis, a team’s winning percentage for the season must be highly correlated with its cumulative  (for the season) margin of victory.

Win percentage =   Constant + Slope  X  Cumulative margin of victory

Note that this is not a predictive model about next game nor about whether a team will have winning season. It is simply a model of expected Win percentage  given the cumulative scores. No causation whatsoever is implied. Unlike the faulty model we saw, data already exist and not coded after the fact.

How you would use the model?

At mid-season you enter a team’s cumulative margin of victory (total points scored less total points against) and find the Win percentage suggested by the model. If the actual number is significantly lower than the one suggested by the model, as in the case of 2011 San Diego chargers, you look for explanations for the poor record. At the outset it signals wide variance in team’s performance – when they win, they win big and when they lose they lose the close ones. Then you look for reasons and fix them.

This example is by far the best out there in correct usage of regression, which by definition means looking backwards. This model only looks backwards and it does not predict the future based on past events. And in doing so it treats regression for what it is, correlation and hence accounts for all kinds of biases.

 

Looking for falsifying evidence

Here is a puzzle I saw in Gruber’s flash card for elementary school children.

More people who smoke will develop lung cancer than those who do not smoke.

What research will show smoking does not cause lung cancer?

This is not an argument about smoking, Big Tobacco, or morals. I like this question because it is simple, popular and familiar to most of us. The first statement makes us draw the most obvious conclusion – smoking causes lung cancer. Likely we wont look past this statement. And that is what makes the question very interesting.

The questions are, given all our knowledge and pre-conceived notion (so to speak), if you were asked to falsify the causation claim,

  1. What research you will do?
  2. What data you will seek?

This twist makes us stop, ignore our System 1 (Kahneman) and think. Finding one more example to support the claim is not difficult. Finding falsifying evidence is not only difficult but requires a different thought process.

You see numerous such causation claims in pulp-non-fiction business books (7-Habits, In Search of Excellence, Good to Great, Linchpin, Purple Cow) and blogs. Mom and apple-pie advice about startup, running a business, marketing etc. bombard us everyday in twitter. Our System 1 wants us to accept these. After all these are said by someone popular and/or in power and the advice is so appealing and familiar.

Penn Jillette of Penn and Teller wrote,

“Magic is unwilling suspension of disbelief”

For example the audience cannot ask to walk up the stage to look inside boxes. They have to accept the magician’s word for it. That is unwilling suspension of disbelief. When it comes to gross generalizations and theories supported only by the very data that is used to form them (e.g., What can we learn from xyz) we don’t have to suspend disbelief. We have the will to seek the evidence that will falsify the claim.

Do we stop and look for falsifying evidence or find solace in the comfort of such clichéd advice?

By the way, the answer to the Gruber puzzle is in looking for lurking variable. And there is none.

If you cared to run the numbers – Looking beyond the beauty of Infographics

I debated whether or not to write this article. There is really no point in writing articles that point out flaws in some popular piece. Neither the authors of those posts nor the audience care. For those who care, they already understand the math and this  article adds no incremental value.

But the case in point is so egregious that it serves as a poster boy for the need for running the numbers, to test BIG claims for their veracity, and look beyond the glossy eye candies.

This one comes from VentureBeat and has a very catchy title that made 2125 people to Like it on Facebook. All of them likely just read the title and are satisfied with it or saw the colorful infographic and believed the claim without bothering to check for themselves. There is also the comfort in knowing that they are not alone in the Likes.

You can’t expect the general population to do some critical thinking or any analysis given the general lack of statistical skills and their cognitive laziness. It is the System-1 at work with a lazy System-2 (surely you bought Kahneman’s new book).

You would think the author of the article should have checked, but the poor fellow is likely a designer who can do eye-popping  infographics and cannot run tests for statistical significance. He is likely an expert in stating whether using rounded corners with certain shading is better #UX or not.

The catchy title and the subject also don’t help.

So almost everyone accept the claim for what it is.  But is there one bit of truth in VentureBeat’s claim?

Let us run the numbers here.

Without further ado, here is the title of the article that 2125 facebook people Liked.

Women who play online games have more sex (Infographic)

How did they arrive at the claim? They looked at data collected by Harris Interactive which surveyed over 2000 adults across US. Since the survey found 57% female gamers reported having sex vs. 52% female non-gamers, it makes the bold claim in its title. Here is a picture to support the claim.

The claim supported by the beautiful picture sounds plausible?

How would you verify whether the difference is not statistical noise?

You would run a simple crosstab (chi-square test)- and there are online tools that makes this step easier. What does this mean? You will test whether the difference between the number of female gamers reported having sex and female non-gamers reporting the same is statistically significant.

The first step is to work with absolute numbers not percentages. We need numbers that 57% and 52% correspond to. For this we need number of females surveyed and what percentage are gamers and non-gamers.

The VentureBeat infographic says, “over 2000 adults surveyed”. The exact number happens to be 2132.

Let us find the number of gamers among females. The article says, of the gamers – 55% are females and 45% are males. This is not same as 55% of females are gamers. Interestingly they never reveal to us what percentage of the surveyed people are gamers. So we resort to data from other sources. One such source (circa 2008) says, 42% of population play games online. We can assume that the number is now 50%.

So the number of gamers and non-gamers is 1066 each. Then we can say (using data from the infographic)

Number of female gamers = 55% of 1066 = 587
Number of female non-gamers = ?? (it is not 1066-587)

The survey does not say number of males vs. female, but we can assume it is split evenly. If you want to be exact you can use the ratio from census.gov  which states 50.9% female to 49.1% male). So there are likely 1089 females surveyed.

That makes number of female non-gamers = 1089 – 587 = 502

The next step is find number of women reported having sex (easy to do from their graph)

Number of female gamers reported having sex = 57% of 587 = 335 (not having sex = 587-335 = 252)

Number of female non-gamers reported having sex = 52% of 502   = 261 (not = 241)

Now you are ready to build the 2X2 contingency table

Then you run the chi-square test to see if the difference between the numbers is statistically significant.

H0 (null hypothesis): The difference is just random

H1 (alternative hypothesis): The difference is not just random, more female gamers do have sex than female non-gamers.

You use the online tool and it does the work for you.

What do we see from the results? The Chi-square calculated for p-value of 0.05 (95% confidence) is 2.82. For the difference to be statistically significant the value has to be at least 3.84 (degrees of freedom =1).

Since that is not the case here, we see no reason to reject the null hypothesis that the difference is just random.

You can repeat this for their next chart that shows have sex at least 1x per week and you will find no reason to reject the null hypothesis.

So the BIG claim made by VentureBeat’s article and its colorful infographic is just plain wrong.

If you followed this far you can see that it is not easy to seek the right data and run the analysis. Most importantly it is not easy to question such claims from a popular blog. So we tend to yield to the claim, accept it, Like it, tweet it, etc.

Now that you learned to question such claims and credibly analyze it, go apply what you learned to every big claim you read.

Independent Events, Inference Errors and Super Bowl XLV

Papa John’s, the pizza chain, is betting big that the Super Bowl today will not go into overtime. They are offering a free pizza to everyone who registers if the game goes into overtime. 

If millions people register for this free offer, Papa John’s stands to lose big. The costs are not only the cost to make and deliver the pizza but also the lost sales because the franchises will be busy making these free pizzas instead of serving pizzas to paying customers.

If you watch the any of the TV sports anchor speak, you will hear them say,

In the history of Super Bowl, we never went into overtime.

Papa John’s could take comfort in those predictions. If their marketing manager who ran this campaign had reasoned that the likelihood of OT is close to 0 and hence their expected loss is $0, he cannot be more wrong.

The teams, their rosters, weather conditions, techniques, equipment, context, player conditions etc. are all completely different. The fact that past 44 consecutive events turned out one way does not tell you anything about how the 45th event will turn out.

Consider this simpler example. You pick 45 random people from a crowd and ask them all to pick a coin from their pocket and toss it. As you check the outcome, one after the other 44 of them got Heads. But the probability the 45th random person tossing her own coin will get heads is still half.

Other  similar statements we will hear from sportscasters  that you should give no second thought to:

“the team that led by 10 points or more at half-time always won”
“the team that scored first won 60% of the times”

If you are in mood for more math, here is a link to my article on Bayesian stats and independent events.

Regarding Papa John’s decision,  they could be doing this to get email addresses of customers, that is not a bad move as long as the goal is stated upfront and not after the fact. What we do not want to hear is, “yes we lost $10 million today but look at million emails we collected”

Fallacies of Cure-All Popular Prescriptions

Every Guru has his prophecy (some have more than one). The one cure all we must adopt because it and it alone can help our sorry state of affairs.  We are told to focus on

–           remarkable products

–           astonishing stories

–          impeccable service

–          0 defections,

–          customer engagement

It is hard not to fall in love with these claims because they sound so acceptable and plausible.  Even if there are no benefits in the short run we are told that these have long term effect on stocks.

There are at least three major fallacies in all such broad recommendations (these are my nomenclatures)

Infinite Resources or Zero Opportunity Cost Fallacy: We only have limited resources but have many demands on these. If not, we do not need to make any choices. Any benefit from  a method ignores that the resources invested on this method have alternative uses. For instance, what is the cost of striving for 0% customer defection? Could you be investing it in new product or new market development?

Infinite Marginal Benefit Fallacy: Any action you take can only deliver limited marginal benefit.  Depending on where your business is and the market you play in you may see no benefit at all. Proponents of popular recommendations overestimate the benefits from their methods. They predict infinite benefit from doing just more of this one action regardless of where your business is.

Isolation Fallacy: You are not the only player, there are many other players in the game and their moves are not taken into consideration. The claims ignore that the same methods are available to all the players and even so they need not play by any of your assumptions or model of the world.

For instance, BestBuy has been the darling of customer service and social media Gurus. It had twelpforce, engaged its customers with online reviews, provides truly great service. All those methods came to naught when it guessed wrong about the product mix, pricing and consumer behavior. Their customers, quite rationally, went to the store to learn about the products and ordered online, right from the store.

This is not to say there is no marginal benefit to adopting any such recommended practices but you need to be aware of these fallacies before investing your product and marketing dollars.

Verifying the obvious may show it isn’t true.

Do you practice evidence based management?

Let us hunt for something interesting in this data gold mine

How many times have you heard this?

We are collecting a lot of data on our customers/transactions/sales/logs,  let us look at this goldmine to see if we can find anything interesting .

The problem with seeking something interesting is you are bound to find it. You might call the next statement tautological, but the fact is if it is interesting you are bound to find.  To a determined data-miner any interesting statistical outlier will eventually show up, then it is simply writing the hypothesis prediction.

Data mining or as some might call it data trolling is looking for  patterns from data sitting around, as opposed to deliberate decision making which requires seeking specific information for reducing uncertainty. But data can fit any number of hypotheses. However mining for a cause, we are bound to pick

  • Ones that are most convenient, like the man searching for lost key under the light.
  • Ones that are familiar, based on our past experience and our beliefs – well there are many fables about this.

The way to make informed decisions is to frame hypothesis based on the best prior knowledge we have. Know that this is just an hypothesis, not a fact and has uncertainties associated with it. Then collect specific data to refine it and reduce the uncertainty.

We will never know all the facts with certainty, but if we realize that what we know has uncertainty associated with it and there could be far more that we do not yet know, we are on the right track.

How do you make your decisions?