I saw this study from HP that used analytics (okay Big Data analytics, whatever that means here) to predict employee attrition.
HP data scientists believed a companywide implementation of the system could deliver $300 million in potential savings “related to attrition replacement and productivity,
I must say that unlike most data dredging that goes on with selective reporting these data scientists started with a clear goal in mind and a decision to change before diving into data analysis. It is not the usual
“Storage and compute are cheap. Why throw away any data? Why make a decision of what is important and what is not? Why settle for sampling when you can analyze them all? Let us throw in Hadoop and we will find something interesting”
Their work found,
Those employees who had been promoted more times were more likely to quit, unless a more significant pay hike had gone along with the promotion
The problem? Well this is not the hypothesis they developed independent of data and then collected data to test this. That is the prescribed approach to hypothesis driven data analysis. Even with that method one cannot stop when data fits the hypothesis because data can fit any number of plausible hypotheses.
The problem is magnified with Big Data where even tiny correlations get reported because of sheer volume of data.
What does it mean that people who are promoted often quit?
Is it the frequent promotion that is the culprit? Isn’t it likely that those who are driven and high value-add more likely to get promoted often, likely to want to take on new challenges and also look attractive to other companies?
The study adds, “unless associated with a more significant pay hike”.
Isn’t it more likely that either the company is simply using titles to keep disgruntled employees happy or just making up titles to hold on to high performance employees without really paying them for the value add? In either case, aren’t the employees more like to leave after few namesake promotions that really don’t mean anything?
Let us look at the flip side. Why are people who are not promoted frequently end up staying? Why do companies give big raises to keep those who were promoted?
Will stopping frequent promotion stop the attrition? Or will frequent promotion with big pay raises stop it? Neither will have an effect.
The study and the analysis fail to ask
Is the business better off paying big raises to keep those who are frequently promoted than letting them leave?
Is the business better off if those who are not promoted often choose to stay?
That is the problem with this study and with Big Data analytics that do not start with a hypothesis developed outside of the very data they use to prove it. It finds the first interesting correlation, “frequent promotions associated with attrition” and declares predictability without getting into alternate explanations and root cause.
Big Data does not mean suspension of mind and eradication of theory. The right flow remains
- What is the decision you are trying to change?
- What are the hypotheses about drivers- developed by application of mind and prior knowledge?
- What is the right data to test?
- When data fits hypothesis could there be alternate hypotheses that could explain the data?
- How does the hypotheses change when new data comes in?
How do you put Big Data to work?