This regression model is beautiful and correctly used

Yes, the X and Y are switched. But think about it, it is just convention

It appears there is new found craze for regression analysis in tech blogs. Ok may be not, there was just this article and then another in Techcrunch. If you really want to understand and admire a fantastic use of regression analysis you should read this article by WSJ’s Number’s Guy, Carl Bialik.

Carl set out to find why the San Diego chargers had yet another bad 2011 season, winning fewer games than their talent and scoring would otherwise suggest,

“What’s frustrating about San Diego’s poor starts in recent seasons isn’t just that the team appears to have had too much talent to lose so many early games. It’s that the Chargers also outscore opponents by too much to explain their relatively poor record.”

Carl’s hypothesis, a team’s winning percentage for the season must be highly correlated with its cumulative  (for the season) margin of victory.

Win percentage =   Constant + Slope  X  Cumulative margin of victory

Note that this is not a predictive model about next game nor about whether a team will have winning season. It is simply a model of expected Win percentage  given the cumulative scores. No causation whatsoever is implied. Unlike the faulty model we saw, data already exist and not coded after the fact.

How you would use the model?

At mid-season you enter a team’s cumulative margin of victory (total points scored less total points against) and find the Win percentage suggested by the model. If the actual number is significantly lower than the one suggested by the model, as in the case of 2011 San Diego chargers, you look for explanations for the poor record. At the outset it signals wide variance in team’s performance – when they win, they win big and when they lose they lose the close ones. Then you look for reasons and fix them.

This example is by far the best out there in correct usage of regression, which by definition means looking backwards. This model only looks backwards and it does not predict the future based on past events. And in doing so it treats regression for what it is, correlation and hence accounts for all kinds of biases.


Survivorship Bias and Other Flaws in Anderson’s FREE

[tweetmeme source=”pricingright”] In his new book, FREE: The Future of a radical Price, Mr. Chris Anderson supports his arguments with many examples of businesses that used razor-razor blade model, advertising model, and free + premium model. The last few pages of his books are just a list of examples of businesses that are successfully implementing, according to him, what he calls the “freemium” model. Are examples enough to state absolutes like “the future of a radical price”?

Even if that is enough, Mr. Anderson lists only those businesses that seem to have made it, at least for now, and does not include those businesses that tried many of the free models and failed. That is the classic survivorship bias. If we restrict just to the new media businesses that Mr. Anderson focuses on, there are many instances of ventured that went under. Even the small subset one can find in TechCrunch’s  “DeadPool” is a daunting number.

Even among those businesses selected,  the time horizon is too short to say they are successful or will deliver long term profit growth. Mr. Anderson uses  a different metric, “uptake among customers” rather than profit to measure their success. His careful choice of metric is not by accident, it is about cleverly framing the argument and directing his readers and listeners to focus on a metric that is irrelevant but supports his argument.

The next problem is confusing correlation with causation. Among the blockbuster success stories he quotes like YouTube, he attributes the customer uptake to the free model. He uses  Prof. Dan Ariely’s  Hershey’s experiment to substantiate this claim on causation. You can see Prof. Ariely’s comments on people using his experiment in his blog. In his Hershey’s  experiments, the claims were based on experiments that used control groups and treatment groups. But Mr. Anderson makes his claim based on YouTube being free.

Businesses, before jumping on Mr. Anderson’s far-reaching conclusions, should ask about his decision making process and analyze their own business based on hard data. As professors Pfeffer and Sutton point out in their book Hard Facts, the difference between an academic (who is much maligned by the new media Gurus) and a self proclaimed Guru is  that an academic gives you an open system of decision making where as a Guru gives you a closed system that talks in absolutes, ignores evidence, focuses just on benefits and minimizing the drawbacks of their recommendation.