A teachable moment in charts, statistics and Data pseudoscience

Here is a chart and the associated assertion on best times to launch iOS apps. This comes to us from data collected and analyzed by SensorTower. (This website and image are linked on April 4th, hopefully they change it all when you read it.)

The claim: Sunday is the best day to promote purchases, period.

Data collection: our Data Science team did a study of all the primary iOS categories to find out which days of the week typically have more estimated downloads and revenue.

I take it what the chart and data is self explanatory. Now can you point out the flagrant flaws in the chart, data and the claim?

If you are not able to get past the beauty of the chart and the boldness of the claim show it to a fifth grader who is not afraid of calling out that the emperor has no clothes.

This is labeled data science which should not impress you or overwhelm you. There is no science here and by any current definition of “data science” this does not even come close. Because you know with data science there is hypotheses, statistics, data cleansing and data validation involved.

Here are the easiest ones you should be able to call out.

  1. Look at this chart and all other similar charts for several categories in that blog post. Look at that nice smooth lines. Then look at the data points again. These are averages per weekday and hence discrete by definition and must only be shown with using bar/column charts.
  2. Look at where the y axis close. It seems standard linear scale. Then what happened to 16? If these are percentages then they should add up to 100%. If I eyeball the data points from the graph I should get within a percentage of 100%. But you get14.9+14.3+14+13.8+13.7+14.8+17.1 = 102.6
    We must conclude that the 17% is really 16%, even there there is error in data.
  3. Take a look at the bold assertion that “Sunday is the best day, period”. To make such a claim, the venerable data scientists should not eye ball it or take the 1% difference as significant – economically or statistically. They must run Chi-squared test to see if the difference is statistically significant. When you do so with help of an online calculator you will find it is not
    chi-square-2When you do you will find the differences are not statistically significant and just part of the randomness. You would think the data scientists would know do to this.
  4. Finally look at all the other charts and the claims about each category of apps. If you were to do another cross cut of these numbers and run another statistical test you will find there is no statistically significant difference across the App categories.

I don’t do data science at least not the way some of these sites do.

Even if the differences are statistically significant one has to ask of it is economically significant. Is the 1-2% difference enough to shift your operating procedures? Just because you shift more of promotion dollars to Sunday will you keep getting more benefits? Don’t forget Fallacy of Composition, if just one marketer takes advantage of Sunday that would work but what if all shifted their launches to Sunday?

Where do you stand?