This regression model is beautiful and correctly used

Yes, the X and Y are switched. But think about it, it is just convention

It appears there is new found craze for regression analysis in tech blogs. Ok may be not, there was just this article and then another in Techcrunch. If you really want to understand and admire a fantastic use of regression analysis you should read this article by WSJ’s Number’s Guy, Carl Bialik.

Carl set out to find why the San Diego chargers had yet another bad 2011 season, winning fewer games than their talent and scoring would otherwise suggest,

“What’s frustrating about San Diego’s poor starts in recent seasons isn’t just that the team appears to have had too much talent to lose so many early games. It’s that the Chargers also outscore opponents by too much to explain their relatively poor record.”

Carl’s hypothesis, a team’s winning percentage for the season must be highly correlated with its cumulative  (for the season) margin of victory.

Win percentage =   Constant + Slope  X  Cumulative margin of victory

Note that this is not a predictive model about next game nor about whether a team will have winning season. It is simply a model of expected Win percentage  given the cumulative scores. No causation whatsoever is implied. Unlike the faulty model we saw, data already exist and not coded after the fact.

How you would use the model?

At mid-season you enter a team’s cumulative margin of victory (total points scored less total points against) and find the Win percentage suggested by the model. If the actual number is significantly lower than the one suggested by the model, as in the case of 2011 San Diego chargers, you look for explanations for the poor record. At the outset it signals wide variance in team’s performance – when they win, they win big and when they lose they lose the close ones. Then you look for reasons and fix them.

This example is by far the best out there in correct usage of regression, which by definition means looking backwards. This model only looks backwards and it does not predict the future based on past events. And in doing so it treats regression for what it is, correlation and hence accounts for all kinds of biases.

 

Leave it to Fast Company experts to find number one predictor of success

Fast Company has an FC Expert Blog. I do not know who these experts or what their qualifications are. They really are experts in declaring broad predictions, especially from reading few lines of some old academic paper. One of the experts write in their blog (the Fast Company says it is not responsible for their wisdom),

Grit: The Top Predictor of Success

Why do some companies consistently outperform their competition? Why do some people become champions while others fall short? What skills do you need to improve to reach your highest potential?

How ironic that a back-to-basics approach carries the day: It turns out that good old-fashioned grit is the number one indicator of high performance.

The experts, it turns out, did not read the details of the paper they quote. Nor do they seem to understand how predictability is measured in statistical terms and what it means. Needless to say they neglect to speak about omitted variable bias and other experimental errors.

What the paper says is grit, a trait defined by the authors, has an incremental R2 of 4%. That is when you add measure of Grit to whatever linear regression model they were building, the predictability of the model increased by 4%.

4%, just 4% increase after all other variables.

To go from here to “The Top Predictor of Success” is ludicrous.

Not just that, even the authors of the paper list severe limitations. The very definition of Grit is amorphous, it is highly correlated with the Big Five traits (classified in Psychology literature) and in their studies the authors measured it based on self-reporting by test participants.

From a study with such severe limitations (I am surprised it was even published), we get sage advice from Fast Company experts,

It doesn’t matter if you’re rich or poor, come from a good neighborhood, have a fancy-pants degree, or are good looking. We all have nearly limitless potential, and the opportunity to seize it is waiting for you.

Let old-school grit and determination serve as the catalyst to achieving your own personal greatness.  You don’t need another tech gadget; just the same killer app that has been foundation of success since the beginning of civilization.

The expert has filtered out gaping holes in the original study, ignored effect of lurking variables,  generalized a self-reported measurement of students to the entire population and urges us to show grit.

I grit my teeth!

Yet Another Causation Confusion – School Ratings and Home Forclosures

Here is the headline from WSJ,

One Antidote to Foreclosures: Good Schools

The article is based on  data mining by Location Inc, that found that

over past six months percentage of foreclosure (or “real-estate-owned”) sales went down as the school ranking went up in five metro areas

Next we see a news story attributing clear causation. This one is hard to notice for some as it appears to be longitudinal – reported over six month period. If it were a simpler cross-sectional study most will catch the fallacy right away.

First let me point out this is the problem with data mining – digging for nuggets in mountains of Big Data without an initial hypothesis and finding such causations.

Second school rating improvement could be due to random factors that coincide with lower foreclosure.

Third despite the fact that longitudinal aspect implies causation there are many omitted variables here – the common factors that are driving down foreclosure and driving up school rating.

School rating is not an independent metric. It relies not just on teacher performance  but also on parents. The same people who are willing to work with their kids are also likely to be fiscally responsible. Another controversial but a proven factor is the effect of genes on children’s performance.

Ignoring all these if we focus our resources on improving school ratings to solve foreclosure crisis, we will be chasing away the wolves that cause eclipses with loud noises.

3 Factors that Drive Customer Satisfaction Rating

When it comes to customer satisfaction rating, more of everything isn’t the answer. From regression analysis of years worth of customer satisfaction rating and from related works done by others, we find that customer satisfaction is driven by 3 basic factors (from stated rating studies):

  1. Buying experience: How easy it is to evaluate choices and complete the buying process? Customers treat buying experience as part of the product experience. While rational thinking dictates that these costs are incurred once and should be treated as sunk by the customers, research(Journal of Management Information Systems Winter 2007-08) shows that these costs remain sticky and customers treat buying experience as part of the product experience.
  2. Delivering what is promised: Does the product quality and its realized benefits match what was promised and most importantly what the customer expected it to be? This is not about delighting the customers are delivering more that what is promised. A customer who walks into WalMart has one set of expectation and the one who walks into Nordstrom has another. For the segment you are targeting, the product benefits must match your positioning and messaging.
  3. Experience when things go wrong:  In the case when things go wrong, customers need support, how easy it is to get support and how they are taken care of. No customer believes things will never go wrong but the type of support they receive and how the problems are handled are what customers treat as relevant to their overall satisfaction rating. For example, a Corolla customer does not expect the dealership to send a loaner car and tow-truck for services, but a Lexus customer does.

Go head test this out today. Run a very simple survey of 4 questions to your customers, (use 1-10 scale)

  1. Please rate your overall satisfaction rating with our products and services.
  2. Please rate how satisfied you are with your buying experience (how easy it is to find what you need, evaluate options and complete the buying process)
  3. Please rate how satisfied you are with our product quality (meeting your expectations, delivers what was promised)
  4. Please rate your support experience (ease of getting help, timeliness, how you were treated)

Run a regression using (1) as dependent variable and the rest as independent variable and you will find out how relevant the 3 factors are to your own situation.

Caution: Regression analysis still only finds correlation. There are numerous lurking variables that were not fully studied. But research from other data sets make it more likely that these variables have causation relation to customer satisfaction.

Sufficient but not Necessary!

The traditional media and the social media are peppered with stories on how one can achieve success like other successful entities.  Examples include, 7 habits, Good to Great, and numerous blog articles that follow the similar pattern.  Almost all of these articles look at a successful business or a person and look for observable positive traits . Then they attribute the success to the presence of such positive traits.

The general arguments against such studies include:

  1. Treating correlation as causation
  2. Different biases (survivorship, selection, availability, hindsight)
  3. Methodology errors like omitted variable bias
  4. We can’t stop because the data fit an hypothesis, data can fit any number of hypotheses.

Even if we set all these flaws aside and accept that indeed the success was the direct result of the positive traits there is another problem. These traits may be sufficient to the success but are they necessary?

Take an extreme example (for illustration). Let us say you observe a tall person in a fruit orchard. You observe her effortlessly pick much more fruits than others thanks to her height which gives her access to more opportunities. Her height was sufficient to get more fruits, but was it necessary?

Next time you see articles on “6/7/8/9 ways to do marketing/product-launches like Apple/Google/twitter/GratefulDead”, even if you look past the biases you should ask if the methods are relevant to your situation and are indeed necessary for your success.

Is Customer Loyalty A Predictor Of Profitability?

[tweetmeme source=”pricingright”] Much has been said and written about the need for customer loyalty. The need to focus and attain customer loyalty is intuitively clear to all marketers. Some of the key arguments for customer loyalty include

  1. Reduced Customer Acquisition costs – Since it costs $X to acquire new customers, any customer you hold on to saved you $X. For example, it takes mobile providers $350 to acquire new customers and there are similar metrics for most products.
  2. The Loyalty Effect: Longer a customer stays longer they keep paying you. There was a book by the same name that claimed up to 75% increase in lifetime value of a customer if they stayed longer.
  3. Cross-Sell & Up-Sell: Since you keep your customers and come to know more about them it creates additional revenue opportunities through cross-sell and up-sell opportunities.
  4. Price Tolerance: Loyal customers keep buying from you because they are delighted by your product and are less sensitive to prices.  Some even claim that loyal customers do not even bother to use coupons and promotions, thereby saving you money.
  5. Decreasing Cost to Serve: The more you understand your customer’s usage behavior and needs fewer the mistakes in servicing them and hence lower the cost to serve them.
  6. Bump From Word of Mouth: Loyal customers are also your best marketers, they are happy to write online reviews and promote your products to all their friends and web communities. This means they generate additional incremental revenue.

All these factors seem plausible and the “gut feel” says these must be true.  If even a subset of these six factors are a work, customer loyalty must be a very good predictor of sales growth and profitability.

We should be able to validate the following models

Sales Growth =   Constant  +   ß1 * (Customer Loyalty)

Profitability =  Constant  + ß2 * (Customer Loyalty)

(ß1 and ß2 are the weights of  customer loyalty )

In a study published in circa 2000 in the Total Quality Management journal, researchers studied precisely these two models for a large set of products and services. The result?

Loyalty is a poor predictor of both sales growth and profitability. Their R-square values are 6% for profitability and 2% for sales growth. (For services the number goes to 14.7% and 7.8% respectively). That means only a tiny fraction of the changes in sales growth and profitability are explained by changes in customer loyalty.

Loyalty has positive impact on sales growth but more strikingly, for products, the impact on profitability is negative, which means higher the loyalty lower the profitability. This means any attempt to “buy loyalty” with price cuts does bring you loyalty but at lower profitability.

The net is, what seems too obvious isn’t so. This is not to categorically dismiss need for loyalty but the positive effects of loyalty are clearly overrated. If their effects are so low then there is a high opportunity cost to improving them. You cannot put all the  wood behind the loyalty arrow!

Sidebar:

Correlation means two variables are associated and the extent of association si expressed as correlation coefficient. It ranges from -1 (low,high)  to +1 (high,high). A value of 0 means no correlation.

Predictability, R-square, means one variable is a predictor of other. It is measured as a square of correlation coefficient. So two variables that have a correlation coefficient of 0.8 have a predictability of only 0.64. R-square is usually expressed in %, so 64% means 64% of changes in dependent variable are explained by changes in predictor variable. That said, correlation does not mean causation. There are other factors to consider including but not limited to statistical significance of weights of variables, omitted variable bias, etc