Data Mistakes Even Children Can Find

Note: I am happy to announce a new  addition to my blog, Prithi Srinivasan, presently a fifth grader. Look for her thoughts and analysis on data, coding, probability and economics.


Take this graph on iOS book sales, for example. Look at it carefully.

Can you spot any mistakes, because I sure can: three of them, and I’m only a fifth grader.

The big revealment:

1. The first mistake is that between 0% and 14%, there should be a break, as all the values other than 0% to 14% increase equally.

2. There was one exception to that last statement. Between 15% and 17% there has to be a 16%— not a break.

3. It should be a bar graph! Line graphs are meant to show differences over time–as one big concept. This graph ought to be a bar graph because it is showing the percentage of book sales per day. It shows separate concepts.

If they get paid to create graphs with many errors, then I should get paid for fixing those errors.

A teachable moment in charts, statistics and Data pseudoscience

Here is a chart and the associated assertion on best times to launch iOS apps. This comes to us from data collected and analyzed by SensorTower. (This website and image are linked on April 4th, hopefully they change it all when you read it.)

The claim: Sunday is the best day to promote purchases, period.

Data collection: our Data Science team did a study of all the primary iOS categories to find out which days of the week typically have more estimated downloads and revenue.

I take it what the chart and data is self explanatory. Now can you point out the flagrant flaws in the chart, data and the claim?

If you are not able to get past the beauty of the chart and the boldness of the claim show it to a fifth grader who is not afraid of calling out that the emperor has no clothes.

This is labeled data science which should not impress you or overwhelm you. There is no science here and by any current definition of “data science” this does not even come close. Because you know with data science there is hypotheses, statistics, data cleansing and data validation involved.

Here are the easiest ones you should be able to call out.

  1. Look at this chart and all other similar charts for several categories in that blog post. Look at that nice smooth lines. Then look at the data points again. These are averages per weekday and hence discrete by definition and must only be shown with using bar/column charts.
  2. Look at where the y axis close. It seems standard linear scale. Then what happened to 16? If these are percentages then they should add up to 100%. If I eyeball the data points from the graph I should get within a percentage of 100%. But you get14.9+14.3+14+13.8+13.7+14.8+17.1 = 102.6
    We must conclude that the 17% is really 16%, even there there is error in data.
  3. Take a look at the bold assertion that “Sunday is the best day, period”. To make such a claim, the venerable data scientists should not eye ball it or take the 1% difference as significant – economically or statistically. They must run Chi-squared test to see if the difference is statistically significant. When you do so with help of an online calculator you will find it is not
    chi-square-2When you do you will find the differences are not statistically significant and just part of the randomness. You would think the data scientists would know do to this.
  4. Finally look at all the other charts and the claims about each category of apps. If you were to do another cross cut of these numbers and run another statistical test you will find there is no statistically significant difference across the App categories.

I don’t do data science at least not the way some of these sites do.

Even if the differences are statistically significant one has to ask of it is economically significant. Is the 1-2% difference enough to shift your operating procedures? Just because you shift more of promotion dollars to Sunday will you keep getting more benefits? Don’t forget Fallacy of Composition, if just one marketer takes advantage of Sunday that would work but what if all shifted their launches to Sunday?

Where do you stand?

Decision Making Under Uncertainties – Statistical Modeling

Whether it is filling out March Madness bracket or making investment decisions on a venture it all comes down to scenario analysis taking into account the variabilities. Because it is the variance that kills you.

In past years The Journal used to run Blindfold March Madness Bracket to help us make data driven decisions to fill out the bracket and pick the winner. This year they simplified it for us and yet made it extremely sophisticated. They codified the priorities, added a way to introduce a level of randomness and made a sophisticated statistical modeling for scenario analysis.

In essence we all can be Nate Silver with this model.

Whether it is filling a bracket for entertainment or a key investment decision the process comes down to what WSJ recommends:

  1. Choose a starting point: What is important to you?
  2. Customize your priorities: Assign relative priorities – like is it the people? market? traction?
  3. Account for unknowns
  4. Understand your biases and quantify the level of bias you have – for example how you “feel” about the person
  5. Run statistical simulation

How do you make decisions under uncertainty?

Freemium has run its course

This was originally published in Gigaom.

  • “We are now seeing the end of the freemium model — signing up users for free and trying to upsell,” said Christian Vanek, CEO of the Boulder-based SurveyGizmo, in a recent phone conversation.
  • “6.5 million unique users is not all that it’s cracked up to be. I don’t want hits. I want revenue. I want a real business,” said Matt Wensing, founder and CEO of Stormpulsein an interview with Mixergy.
  • “Make a product people want to pay for,” said Marco Arment, founder of Instapaper, in a Planet Money interview.

Three easily available examples do not make indisputable evidence against freemium. Just like Dropbox, Evernote and RememberTheMilk do not make a case for freemium. But these three quotes reflect a return to the roots of marketing — starting with customer needs, choosing the needs you want to serve and getting your fair share of the value created.

In the oft-cited Hershey’s experiment that started the free-mania, behavioral economists from MIT tested customer preference for Hershey’s and Ferrero Rocher chocolates at two different price points. For one group, they offered Hershey’s at one cent and Ferrero Rocher for 26 cents. For another, they offered the chocolates at zero cents and 25 cents respectively. When the Hershey’s chocolate was free and the Ferrero Rocher chocolate was 25 cents, 90 percent of the participants chose Hershey’s. $0 price seems to have done the magic in driving customer adoption. The result became the foundation of the freemium school of thought — free is free marketing. First use the free version to drive adoption and build a large customer base, and then find ways to monetize that base by upselling the paid version and selling extras.

Ninety percent is an eye-catching statistic in books about the freemium model, but let’s stop and ask some basic questions about running a profitable venture.

  1. What do you know about your target customers?
  2. What urgent needs do the free and paid versions meet for these customers?
  3. Will the products remain relevant in the customers’ future?
  4. If fifty other sellers stand next to you and give away free Hershey’s chocolates, Skittles etc., what will happen to your share of the market?
  5. As a startup founder, which customers should you focus on first with your limited resources?

The five questions above are the key principles of marketing. Unfortunately, choosing a freemium model does not help answer these questions. Worse, it muddles the answers by misdirecting startup founders to focus on the product rather than customer needs. Stormpulse, a Web-based platform for managing weather risk, learned that free can attract all the wrong customers. The company’s CEO Matt Wensing told Mixergy, “Free brought us recreational users who tried us for superficial reasons, while those who found real value were the enterprise customers.”

Here is an alternative, which unlike freemium is neither new nor a fad:

Start with the customers, not your product. The product could be new but the customer needs are not. Whether it is a “bits” product with zero marginal cost or “atoms” product with non-zero marginal cost, customer needs come first.In fact, it is not a product until you have identified a set of customers whose needs you meet and who want to pay you for that value.

Make your choice. Stormpulse and the online survey platform, SurveyGizmo, both realized that a successful strategy involves making choices. They couldn’t go after every customer who is willing to try out their products. Instead, the leaders at both organizations chose to focus on enterprise customers, because these customers not only value the products but also have the budget to pay for them. Getting 90 percent of customers to take free Hershey’s chocolates with the hope that they will pay more for extras or will upgrade later is not a strategy. In fact, the presence of free products drags down the expected valueof a customer. Which is another reason why SurveyGizmo decided to downplay its free offering.

Get your fair share of the value created. As Instapaper’s Arment said, charging for the product is still the simplest of all business models. Product innovation does not mean business model innovation. If your product adds compelling value to customers, charging for it is simply getting your fair share of the value created. You do not have to be ashamed of making a profit.

A small percentage of a very large number is indeed a large number, but can your startup stay solvent while you wait for the conversion to kick in? Freemium only offers the hope that non-paying users will fall in love with your product and start paying for it. Shooting an unlimited number of free bullets and hoping some will hit the target is a shotgun approach to monetization. It’s time to take a deliberate and targeted approach. Or as Vanek told me in our conversation, “it is time to retire the shotgun.”

A Frequentist and a Bayesian Frequent a Bar

A Frequentist and a Bayesian have been going to this bar for the past five evenings. The bar is a special kind that offers only two kinds of beers – one is Amber and the other is Dark.

The first evening the Frequentist orders Amber and the Bayesian orders Dark.

The second evening the Frequentist orders Amber and the Bayesian orders Dark.

The third evening the Frequentist orders Amber and the Bayesian orders Dark.

The fourth evening the Frequentist orders Amber and the Bayesian orders Dark.

The fifth evening the Frequentist orders Amber and the Bayesian orders Dark.

This is sixth evening.

What are the chances the Frequentist will order Amber?

What are the chances the Bayesian will order Dark?

Your answer will tell you whether you are a Frequentist or Bayesian. A mindless dashboard driven manager or a hypothesis setting, data seeking, and dynamic decision maker.

Life is a series of probabilities – our job as decision makers is to seek relevant data to reduce the uncertainties.

For extra credit, what are the chances one of the two will be run over by bus or the bar shuts down due to numerous reasons? (Black Swan)

Recipe for Minimum Viable Versioning

jetblue-evenmoreProduct versioning is the simplest and fairest mean to price discriminate. While it may sound unpalatable to the idealistic minds, price discrimination is actually better for customers and businesses. That is when done right. Else it becomes operationally expensive, ineffective in market or worse turns off customers. That is where even established businesses fail. I provide a very simple and testable recipe to do product versioning.

First I want to expand on the benefits of price discrimination. Price discrimination is charging different customers different prices based on their willingness to pay. It is fair when customers willingly pay their prices over marketers enforcing it. It is clearly good for businesses because it helps maximize profits. It is good for customers for many reasons

  1. It allows them to experience a product they otherwise would not be able to afford- like a no frill discount airline that makes overseas travel possible for many.
  2. It allows customers to pay for only what they value and not pay for extras.
  3. It gives them the choice – sometimes they may want to splurge, while most times sticking to basics.
  4. it gives them the flexibility and convenience – like express line at amusement parks.

When done right, for businesses, the clear benefit is profit maximization. There are also other significant benefits

  1. Understand customer segmentation, purchase occasions and behavior.
  2. Optimize capacity – be it manufacturing capacity of breakfast cereal brand or web scale capacity of a cloud service.
  3. Refine (pivot) current product offerings to better tune to customer needs.
  4. Find the right price points for the products without tying down to a single wrong price.
  5. Clear separation of customers – like Nordstrom and Nordstrom Rack.
  6. Surface customers who have even higher willingness to pay.

So it is better to offer multiple versions of your product even if it in its MVP (Minimum Viable Product) stage. This may sound contradictory to the concept of MVP, after all an MVP is meant to test the demand and discover product-market fit with limited resources. How can a startup afford to invest in multiple product versions when it has only limited resources and is lean by definition?

Offering multiple versions of the same product does not mean investing in yet another product line and doubling your development cost. Especially it is not recommended for a startup that is trying to validate its MVP in the chosen customer segment. That leads us to a real simple and effective recipe for Minimum Viable Versioning.

The underlying economic principle is – customer value perceptions are different and if they perceive a value difference (real or not) they will gravitate towards the version that they believe offers them the most value.

Think about that for a minute. This means  you do not build three or four different products but make the customers think there are three or four products with different value offerings. That is you can build just one product and yet price discriminate if you can nudge your customers to believe there are multiple products.

Take for example the most quoted example of a printer  price discrimination where IBM added a chip to slow down print speed. While you could do such a product line change that does entail manufacturing costs and operational complexities.

What if IBM didn’t really add a delay chip but simply changed the specs to say slower speed?  A customer who values speed of printing is not going to be tempted by the lower priced “slower” version while the price sensitive customers are happy to buy the same.

Applying this recipe to the present day cloud services, you may offer a version that is limited in capacity but may not build any code to enforce that restriction. For example a 2GB capacity limit and 10GB capacity limit for two versions. Those who value more capacity will self select to higher priced larger capacity version and the price sensitive customers will stick to the lower capacity version.

Some of those who picked your 2GB limited version may end up using more than that. But so what? Most who believe they want more than 2GB self selected to your higher priced version and you gain better market adoption, better understanding of segment-version fit and better profit without changing the product one bit.

Remember the economic principle – it is not the real value difference as much as the perceived value difference. Value is in the minds of customers with their wallets out and ready to pay.

So go ahead and build one MVP. But offer three versions (Why three?) that CLEARLY and EXPLICITLY communicate to customers the value differences so they can self select. While measuring product market fit with MVP you now can measure price perception and segment-version fit. If you find exceptional demand for your “low-end” version, find a way to build it at lower cost so you get to keep more of the price as profit. Until then, they do not need to know it is the same product.