Update: This article will help you understand my Gigaom guest post on Pinterest revenue: How much does Pinterest actually make?
One of the games my 7 year old and I play while driving home from her school is guessing whether mom will be home when we arrive.
I ask,”what are chances mom will be home when we arrive?”
She would almost always reply, “50-50″
Not bad for someone just learning enumerating the possibilities and finding the fraction. When we arrive home there is either mom or not. So 50-50 seem reasonable.
But are the chances really 50-50? If not how would we find it?
Well let us start with some safety and feel good assumptions, my drive time is constant, there is mom, she always leaves at fixed time and she will arrive.
Other than that we need to know
- What time is it now?
- What is her mean drive time?
- What is the standard deviation of drive time?
Assume that the drive times are normally distributed with the stated mean and standard deviation. It is then a question of finding, in what percentage of the scenarios the drive times show an earlier arrival time. That is the probability we were looking for and it is not 50-50 simply because there are only two outcomes.
Here we did a very simple model. But who knows what the mean is let alone standard deviation. We do not. So we do the next best thing, we estimate. We do not literally estimate the mean and standard deviation but we estimate a high and the low value such that in 90% of the cases the drive time falls in that range. Stated another way, only 10% chance the drive time is outside this range.
This is the 90% confidence interval.We are 90% confident the value is in this interval. Once we have this then it is more simple math to find the mean and standard deviation.
Mean is average of the low and high values.
Standard deviation is the difference between high and low divided by number of standard deviations the 90% probability corresponds to in a standard normal curve (3.29σ).
One you have the mean and standard deviation you can do the math to find the percentage of scenarios where drive time is below certain value.
This is still simple. We treated drive time as the measurable quantity and worked with it. But drive time is likely made up of many different components, each a random variable of its own. For instance time to get out of parking lot, time to get on the highway, etc. There is also the possibility the start time is no more fixed and it varies.
(If you want to build more realistic model you should also model my drive time as random variable with its own 90% confidence interval estimate. But let us not do that today.)
In such a case instead of estimating the whole we estimate our 90% confidence intervals of all these parts. In fact this is a better and preferred approach since we are estimating smaller values for which we can make better and tighter estimates than estimating total drive time.
How do we go from 90% confidence interval estimates of these component variables to the estimate of drive time? We run a Monte Carlo simulation to build the normal distribution of the drive time variable based on its component variables.
This is like imagining driving home 10,000 times. For each iteration randomly pick a value for each one of the component variable based on their normal distribution (mean and sigma) and add them up:
drive time (iteration n) = exit time (n) + getting to highway time (n) + …
Once you have these 10,000 drive times then find what percentage of the scenarios have drive time less than certain value. That is the probability we were looking for.
From this we could say, “there is 63% chance mom will be home when we arrive”.
We could also say, “there is only 5% chance mom will arrive 30 minutes after we arrive”.
When we know there is roadwork starting on a segment we can add another delay component (based on its 90% confidence interval) and rerun the simulation.
That is the power of statistical modeling to estimate any unknowns based on our initial estimates and state our level of confidence on the final answer.
Now what does this have to do with Pinterest revenue?
Read my article in Gigaom