## Pig or a Dog – Which is Smarter?: Metric, Data, Analytics and Errors

How do you determine which interview candidate to hire? How do you evaluate the candidate you decided you want to hire? (or decided you want to flush?)

How do you make a call on which group is performing better? How do you hold accountable (or explain away) bad performance in a quarter for one group vs. other?

How do you determine future revenue potential of a target company you decided you want to acquire? (or decided you don’t want to acquire)?

What metrics do you use? What data do you collect? And how do you analyze that to make a call?

Here is a summary of an episode from Fetch With Ruff Rufman, PBSKids TV show:

Ruff’s peer, Blossom the cat, informs him pigs are smarter than dogs. Not believing her and determined to prove her wrong, Ruff sends two very smart kids to test. The two kids go to a farm with a dog and a pig. They decide that time taken to traverse a maze as the metric they will use to determine who is smarter. They design three different mazes

1. A real simple straight line  (very good choice as this will serve as baseline)
2. A maze with turn but no dead-ends (increasing complexity)
3. A maze with two dead-ends

Then they run three experiments, letting the animals traverse the maze one at a time and measuring the time for each run. The dog comes out ahead taking less than ten seconds in each case while the pig consistently takes more than a minute.

Let me interrupt here to say that kids did not really want Ruff to win the argument. But the data seemed to show otherwise. So one of the kid changes the definition on the fly.

“May be we should re-run the third maze experiment. If the pig remembered the dead-ends and avoids them then it will show the pig is smarter because the pig is learning”

And they do. The dog takes ~7 seconds compared to 5.6 seconds it took in the first run. The pig does it in half the time, 35 seconds, as its previous run.

They write up their results. The dog’s performance worsened while pig’s improved. So the pig clearly showed learning and the dog didn’t. The pig indeed was smarter.

We are not here to critique the kids. This is not about them. This is about us, leaders, managers and marketers who have to make such calls in our jobs. The errors we make are not that different from the ones we see in the Pigs vs. Dogs study.

Are we even aware we are making such errors? Here are five errors to watch out for in our decision making:

1. Preconceived notion: There is a difference between a hypothesis you want to test vs. proving a preconceived notion.

A hypothesis is, ” Dogs are smarter than pigs”.  So is, “The social media campaign helped increase sales”.

A preconceived notion is, “Let us prove dogs are smarter than pigs”. So is, “let us prove that the viral video of man on horse helped increase sales”.

2. Using right metric:  What defines success and what better means must be defined in advance and should be relevant to the hypothesis you are testing.
Time to traverse maze is a good metric but is that the right one to determine which animal is smart? Whether smart or not dogs have an advantage over pigs – they respond to trainer’s call and move in that direction. Pigs only respond to presence of food. That seems unfair already.
Measuring presence of a candidate may be a good but is that the right metric for the position you are hiring for? Measuring number of views on your viral video is good but is that relevant to performance?
It is usually bad choice to pick a single metric. You need a basket of metrics that taken together point to which option is better.
3. Data collection: Are you collecting all the relevant data vs. collecting what is convenient and available?  If you want to prove Madagasar is San Diego then you will only look for white sandy beaches. If you stop after finding a single data point that fits your preconceived notion you will end taking \$9B write down on that acquisition.
Was it enough to test one dog and one pig to make general claim about dogs and pigs?
Was one run of each experiment enough to provide relevant data?
4. Changing definitions midstream: Once you decide on the hypothesis to test, metrics and experimental procedure you should stick to that for the scope of the study and not change it when it appears the results won’t go your way.
There is nothing wrong in changing definition but you have to start over and be consistent.
5. Analytics errors: Can you make sweeping conclusions about performance without regard to variations?
Did the dog really worsen or the pig really improve or was it simply regression to the mean?
Does 49ers backup quarterback really have hot-hand that justifies benching Alex Smith?What you see as sales jump from your social media campaign could easily be due to usual variations in sales performance. Did you measure whether the performance uplift is beyond the usual variations by measuring against a comparable baseline?

How do you make decisions? How do you define your metrics, collect data and do your analysis?

Note: It appears from a different controlled experiment that pigs are indeed smarter. But if they are indeed so smart how did they end up as lunch?

## 8 Flaws in A/B Split Testing

You have been using A/B split testing to improve your mail campaigns and web designs. The core idea is to randomly assign participants to group A or B and measure the resulting performance – usually in terms of conversion. Then perform statistical testing, either t-test (incorrect) or Chi-square test to see if the difference in performance between A and B is statistically significant at 95% confidence level.

There are  significant flaws with this approach:

1. Large Samples: Use of large samples that are most likely to find statistical significance even for small differences. When using large samples (larger than 300) you lose segmentation differences.
2. Focus on Statistical Significance: Every test tool, sample size calculator and articles are narrowly focused on achieving statistical significance, treating that as final word on the superiority of one version over.
3. Ignoring Economic Significance: There may be statistical significance or not, but no test tool will tell you the economic significance of that for your decision making.
4. Misleading Metrics: When tools report Version A is X% better than version B, they are simply wrong. The hypothesis testing used in A/B testing is simply one version is better than other and not by what percent.
5. All or nothing: When the test results are inconclusive, there is nothing to learn from these tests.
6. Discontinuous: There is no carryover of knowledge gained from previous tests. We do not apply any knowledge gained from a test in later tests.
7. Test Everything and Test Often: The method wrests control from the decision maker in the name of “data driven”. This pushes one to suspend all prior knowledge (because these are considered hunches and intuition) and test every thing and test often, resulting in significant costs for minor improvements. Realize that the test tool makers are incentivized by your regular and excessive testing.
8. Mistaking X implies Y is same as Y implies X: The hypothesis testing is flawed. What we test is, “how well does the data fit the hypothesis that we assumed”. But at the end of the test we state, “the hypothesis is supported by the data and is true for all future data”.

The root cause of all the mistakes is in using A/B testing for decision making. When you are deciding between two versions you are deciding which option will deliver you better returns. The uncertainty is in deciding the version. If there is no uncertainty at all, why bother?

The way to reduce uncertainty is to collect relevant information. It is profitable to do so only if the cost to collect this information is less than the expected increase in return from reducing the uncertainty.

You are not in the hypothesis testing business. You are in the business of adding value to your shareholders (that is you, your investors). To deliver value you need to make decisions in the presence of uncertainties.  With all its flaws, A/B testing is not the right solution for decision making!

So stop using A/B testing!

What do I recommend? Send me a note to read a preview of my article on “Iterative Bayesian (TM)”.

## Making a Case for Practicing Evidence Based Management

I am repurposing Pascal’s  wager in making a case for evidence based management over – intuitions, gut feel, “blink”, fads and conventional wisdom. [tweetmeme =”pricingright”]

The net is – In the presence of uncertainties (I treat this as truism) the dominant strategy to pursue for a decision maker is to rely on hard evidence, experiments and analytics.  Nothing more to add to this argument over what Pascal had already said.

Pas

## QED

Most people talk in absolutes, extrapolate from one incident based on their selective memory of that incident and jump to general and sweeping conclusions that leaves no room for alternative explanations. If at all they attempt to use any data, it is based on availability  (what is easier to get) and selective (only data that supports their foregone conclusions and preset notions. Behind every general theory you can bet there is just one data point that the proponent was able to recollect.

The other day I was talking to a professional connection over lunch. She is someone who had worked for a long time in one major valley company before switching recently to another valley company. She said, her previous company is unquestionably the best to work for and she was kicking herself for leaving it. According to her everyone who left that employer to work for other brand name valley companies were returning to the mother ship in droves. I would have grilled her on numbers and how she could tell “the best” when she had worked in just 2 places and whether her comfort factor is coloring her opinion, but I did not.

Clearly this goes on to prove my point, right?

## Under every Complex Formula is …

Look at some of the quotes from the news media on companies with a data driven approach to management and business decisions

1. Applying a complex equation to a basic human-resource problem is pure Google, a company that made using heavy data to drive decisions one of its “Ten Golden Rules” outlined in 2005. (WSJ on Google’s HR plans)
2. Like Google, Facebook calculated the relevancy and authority of information before deciding to display it to me. The News Feed was shockingly complex — calculating and ranking more than a trillion items per day — and the results were very satisfying.  (WSJ on Facebook Newsfeed)
3. Mr. Donahoe installed an entirely new system to determine which items appear first in a search. It uses a complicated formula that takes into account price and how well an item’s seller ranks in customer satisfaction. (WSJ on eBay)
4. What could be more baffling than a capitalist corporation that gives away its best services, doesn’t set the prices for the ads that support it, and turns away customers because their ads don’t measure up to its complex formulas? (Wired on Google)
5. CineMatch, on the other hand, is all math. It matches your viewing and rating history with people who have similar histories. It uses those similar profiles to predict which movies you are likely to enjoy. That’s what these recommendations really are – predictions of which movies you will like. (Netflix movie recommendation)

I am willing to bet that underneath the complexity is a multiple regression model, built with multiple variable and constantly tuned to better future behavior from past actions. Every business collects or has the opportunity to collect significant customer data. Companies like Google and eBay strive to be accurate 99% of the time or more. But building a regression model even with a handful of variables can improve decision making over driving without a dashboard.