Note: The statistical analysis shown here is based on data provided by one VC firm BlueRun Ventures. The ratings they did is likely post hoc and has biases. Hence the results are not as generic as the title says they are and have considerable uncertainties. This is also a long article and relies on linear regression and logistic regression.
Imagine you were asked to invest in ten startups. Given numerical ratings on the Team, Product, Market and Traction but knowing nothing about the specifics of the team, the exact product or the domain they play in, can you pick those that actually received a term sheet? Take this quiz and see how you do. Do not read ahead before you do the quiz.
What characteristics of a startup make it attractive for venture capitalists to invest in it? If you are a startup founder preparing for that pitch, wouldn’t it be nice to know the answer so you can prepare well to maximize your chances of getting that coveted term sheet? For those who are listening, there is no scarcity of advice. Everyone from VCs, startup founders who secured funding at significant valuations and others on the sideline, all have something to say.
Are any of these relevant to startup founders? What is noise and what is signal? Do any of these have hard numbers behind them?
Until now there was no hard quantitative data on startups that pitch to VCs and the outcome. Thanks to data from Jay Jamison, partner at BlueRun Ventures, I have data on 216 startups that pitched to his firm. Jamison rated them on four metrics, Team, Product, Market and Traction using a 5-point scale and also noted the outcome of their pitch. The outcome is rated as likelihood of getting term sheet on a five-point scale, with 5 meaning they got it.
Armed with this data we now can model if any of these traits of a startup influence its ability to get term sheet using statistical analysis. While Jamison did his initial analysis himself, it was not rigorous enough and pointed to incorrect reasons. He later shared his data with me and encouraged me to do not one but two ways of analysis this data to come up with a prediction model.
Stepwise Linear Regression
Let us say there is only one independent variable X and one outcome variable Y. Suppose we had several pairs of these, (x1,y1), (x2, y2) …. based on our observations. A linear regression model tries to find a line of the form Y = mX + C that is the best possible fit, one with least error, given the set of observations.
How good a fit is this model in explaining changes in Y is measured as ratio of two errors and is called R2 or coefficient of determination. Khan Academy has a very nice explanation of R2 that I recommend you check out. It is a positive ratio with maximum value of 1 and minimum of 0. Higher the value, better the fit.
What is that got to do with startups and venture funding? We will model the outcome, whether or not the startup got term sheet as a function of the four traits. We will build a model that has the best fit and also find how good it is in predicting the outcome.
In any regression model, if you try to model with maximum set of variables you will find a very good fit with very high R2. Such a model is useless. We want to find the minimal set of variables that we can control and also measure how the predictability of the model improves as we add variables one at a time. That is stepwise linear regression.
Step 1: Trying to model the term sheet outcome with each of the four variables, separately, I found that Team alone stands out as very good predictor with R2 of 34%. That is 34% of the changes in outcome are explained by changes in Team and 66% are not explainable by Team. It however seems to fit the commonly accepted notion that VCs invest in teams and not products.
Step 2: This step is to build yet another model that retains the Team variable from step 1 and tries to add one more from the remaining three. The second variable that has the most positive impact in improving the predictability? Market. But it did not improve the model’s predictability much. Adding Market moved the R2 only by 10%, meaning Market characteristics have very low predictability.
Step 3: You get the picture. The third variable is Traction and it did even worse with just 5% increase in R2.
Step 4: There is no step 4. The left out variable, Product, had absolutely no role to play in predicting the outcome. If you are obsessing about the product, its features and how well it compares against the others in the market, all that have no impact whatsoever in tipping VCs’ decisions. The product is not relevant.
So the only real startup characteristic with meaningful predictability for getting term sheet, using linear regression model, is how good a team you have assembled.
Now to yet another bigger surprise.
Jamison rated the term sheet outcome as likelihood on 1 to 5 scale But if you take a closer look at his intended meaning, it was really a binary coding – 5 means they gave term sheet to the startup and 1-4 means they said no in four different ways. The outcome is Yes or No. So we should not be running linear regression at all with such binary coding. The right analysis to do is to use logistic regression that measures the probability a startup with given characteristics will get term sheet. So I recoded the term sheet values as 0 and 1 and did just that
Even in this model the Product has no role to play. That should settle the argument with the product types obsessing over details.
The biggest surprise? The biggest predictor in the linear model, Team and the smallest predictor, Traction have absolutely no role in predicting the outcome. The biggest predictor with close to 80% predictability (R2 McFadden used for logistic regression) is the Market rating. The model is in fact real simple. If the market rating is 5, your startup will get funding, if not it didn’t. You play in the hottest market you get funding regardless of the other factors.
This leads to unfortunate conclusions about startups and how VCs make investment decisions.
One, money flows based on the buzz and hype. The very rating of the Market attribute is questionable. Are VCs rating the market based on true value or the prevailing hype?
Two, money flows where there is already lot of money. So more startups that play in the same hot area get funded leading to too many players in a perceived hot market resulting in many startups that are not that distinguishable from each other, fragmentation and likely too many failures.
Third, many reasonable markets with steady growth but lack the buzz, attract no funding and hence attract no startups resulting in no meaningful innovation. This likely explains the credo of Peter Thiel’s FoundersFund, “We wanted flying cars, instead we got 140 characters”.
So what is relevant to the startups? It is not really black or white. Given the investment environment and the unavoidable hype in the valley, if you want to play the game just for funding then you may do well by pitching yet another social/mobile/big data or whatever the flavor of the day is.
If you have a true meaningful innovation that is lot more than 140 characters and have a team that is unmatched in its technical expertise, you will do well by waiting to find your match.