Please don’t embarrass us by having a business model for your startup – VCs

When I wrote the valuation model for Pinterest, many people wrote to me to point out that Pinterest stopped using SkimLinks and hence stopped making any revenue. Making revenue it turns out is not good for your startup (don’t quote this line out of context).

How do you value an investment? Any investment, be it a farm , shares of an established enterprise or a tech startup?

Simple answer is, you look for profits they generate now and how it is expected to grow.

But that level of transparency and simplicity is a problem for venture capitalists investing in Silicon Valley’s startups. The New York Times Bits blog says why VCs don’t want the startups to show any viable business model let alone profits,

“It serves the interest of the investors who can come up with whatever valuation they want when there are no revenues,” explained Paul Kedrosky, a venture investor and entrepreneur. “Once there is no revenue, there is no science, and it all just becomes finger in the wind valuations.”

With any hint about business model one can come up reasonable valuation models for any business. Granted one has to make assumptions to get there but we can quantify the uncertainties in the assumptions and state our valuation in terms of probabilities which can be used to place bets (I mean investment).

That is not good because

they’re interested in pumping up enough hype and valuation to find a quick exit through an acquisition at an eye-popping premium.

How else can you justify $200 million valuation for Pinterest when its chances of making revenue that justifies such a valuation is less than 0.25 percent?

This seems to explain why VCs advice startups to give their product away for free and why VCs don’t advice startups about customer segments and filling an urgent need. As Stanford’s Pfeffer says,

These companies are simply being founded to be bought.

Not to fill an urgent need and to take their fair share of value created.

Google Customer Surveys – True Business Model Innovation, But

Summary:Great business model innovation that points to the future of unbundled pricing. But is Google customer survey an effective marketing research tool? Do not cancel SurveyGizmo subscription yet.

Google’s new service, Customer Surveys, is truly a business model innovation. It unlocks value by creating a three sided market:

  1. Content creators who want to monetize their content in an unbundled fashion (charge per article, charge per access etc)
  2. Readers who want access to paid content without having to subscribe for entire content or muddle through micro-payments (pay per access)
  3. Brands seeking customer insights, willing to pay for it but have been unable to find a reliable or cheaper way to get this
When readers want to access premium content they can get it by answering a question posed by one of the brands instead of paying for access. Brands create surveys using Google customer surveys and pay per use input.

Google charges brands 10 cents per response, pays 5 cents to the content creators and keeps the rest for enabling this three sided market.

Business model is nothing but value creation and value capture. Business model innovation means innovation in value creation, capture or both. By adding a third side with its own value creation and capture Google has created an innovative three way exchange to orchestrate the business model.
This also addresses the problem with unbundled pricing, mostly operational challenges with micro-payments and metering.

But I cannot help but notice severe sloppiness in their product and messaging.

Sample Size recommendation: Google recommends brands to sign up for 1500 responses. Their reason, “recommended for statistical significance”.
Statistical significance has no meaning for surveys unless you are doing hypothesis testing. When brands are trying to find out which diaper bag feature is important, they are not doing hypothesis testing.

What they likely mean is Confidence Interval (or margin of error at a certain confidence level). What is the margin of error, at 95% confidence level? With 1500 samples, assuming 200 million as the population size it is 2.5%. But you do not need that precise value given you already have sampling bias by opting for Google Customer Surveys. Most would do well with just 5% margin of error which requires only 385 responses or 10% which requires only 97 responses.

Recommending 1500 responses is at best a deliberate pricing anchor, at worst an error.

If they really mean hypothesis testing, one can use a survey tool for that, but it is not coming through in the rest of their messaging which is all about response collection. The 1500 responses suggestion is still questionable. For most statistical hypothesis testing 385 samples are enough (Rethinking Data Analysis published in the International Journal of Marketing Research, Vol 52, Issue 1).

Survey of one question at a time: Brands can create surveys that have multiple questions in them but respondents will only see one question at any given time.
Google says,

With Google Consumer Surveys, you can run multi-question surveys by asking people one question at a time. This results in higher response rates (~40% compared with an industry standard of 0.1 – 2%) and more accurate answers.
It is not a fair comparison regarding response rate. Besides we cannot ignore the fact that the response may be just a mindless mouse click by the reader anxious to get to their article. For the same reason they cannot claim , “more accurate”.

Do not cancel your SurveyGizmo subscription yet. There is a reason why marketing researchers carefully craft a multiple question survey. They want to get responses on a per user basis, run factor analysis, segment the data using cluster analysis or run some regression analysis between survey variables.

Google says,

The system will automatically look for correlations between questions and pull out hypotheses.

I am willing to believe there is a way for them to “collate” (not correlate as they say) the responses to multiple questions of same survey by each user and present as one unified response set. If you can string together responses to multiple questions on a per user basis you can do all the statistical analysis I mentioned above.<;

But I do not get what they mean by, “look for correlations between questions” and definitely don’t get, “pull out hypotheses”. It is us, the decision makers,who make the hypothesis in the hypothesis testing. We are paid to make better hypotheses that are worthy of testing.

If we accept the phrase, “pull out hypotheses”, to be true then it really means we need yet another data collection process (from a completely different source) to test the hypotheses they pulled out for us. Because you cannot use the very data you used to form a hypothesis to test it as well.

Net-Net, an elegant business model innovation with severe execution errors.

What are the chances mom will be home when we arrive and what does this have to do with Pinterest revenue?

Update: This article will help you understand my Gigaom guest post on Pinterest revenue: How much does Pinterest actually make?

One of the games my 7 year old and I play while driving home from her school is guessing whether mom will be home when we arrive.

I ask,”what are chances mom will be home when we arrive?”
She would almost always reply, “50-50”
Not bad for someone just learning enumerating the possibilities and finding the fraction. When we arrive home there is either mom or not. So 50-50 seem reasonable.

But are the chances really 50-50? If not how would we find it?

Well let us start with some safety and feel good assumptions, my drive time is constant, there is mom, she always leaves at fixed time and she will arrive.
Other than that we need to know

  1. What time is it now?
  2. What is her mean drive time?
  3. What is the standard deviation of drive time?

Assume that the drive times are normally distributed with the stated mean and standard deviation. It is then a question of finding, in what percentage of the scenarios the drive times show an earlier arrival time. That is the probability we were looking for and it is not 50-50 simply because there are only two outcomes.

Here we did a very simple model. But who knows what the mean is let alone standard deviation. We do not. So we do the next best thing, we estimate. We do not literally estimate the mean and standard deviation but we estimate  a high and the low value such that in 90% of the cases the drive time falls in that range. Stated another way, only 10% chance the drive time is outside this range.

This is the 90% confidence interval.We are 90% confident the value is in this interval. Once we have this then it is more simple math to find the mean and standard deviation.

Mean  is average of the low and high values. 
Standard deviation is the difference between high and low divided by number of standard deviations the 90% probability corresponds to  in a standard normal curve (3.29σ).

One you have the mean and standard deviation you can do the math to find the percentage of scenarios where drive time is below certain value.

This is still simple. We treated drive time as the measurable quantity and worked with it. But drive time is likely made up of many different components, each a random variable of its own. For instance time to get out of parking lot, time to get on the highway, etc.  There is also the possibility the start time is no more fixed and it varies.

(If you want to build more realistic model you should also model my drive time as random variable with its own 90% confidence interval estimate. But let us not do that today.)

In such a case  instead of estimating the whole we estimate our 90% confidence intervals of all these parts. In fact this is a better  and preferred approach since we are estimating smaller values for which we can make better and tighter estimates than estimating total drive time.

How do we go from 90% confidence interval estimates of these component variables to the estimate of drive time? We run a Monte Carlo simulation to build the normal distribution of the drive time variable based on its component variables.

This is like imagining driving home 10,000 times.  For each iteration randomly pick a value for each one of the component variable based on their normal distribution (mean and sigma) and add them up:

drive time (iteration n) = exit time (n) + getting to highway time (n) + …

Once you have these 10,000 drive times then find what percentage of the scenarios have drive time less than certain value. That is the probability we were looking for.

From this we could say, “there is 63% chance mom will be home when we arrive”.

We could also say, “there is only 5% chance mom will arrive 30 minutes after we arrive”.

When we know there is roadwork starting on a segment we can add another delay component (based on its 90% confidence interval) and rerun the simulation.

That is the power of statistical modeling to estimate any unknowns based on our initial estimates and state our level of confidence on the final answer.

Now what does this have to do with Pinterest revenue?

Read my article in Gigaom

Can you answer the “Why” questions with your segmentation strategy?

Read these three quotes about customer segments and write down what you think about their definitions

It’s a higher-educated, higher-income user that resides in the Northeast. More often than not, the Greek yogurt consumer is a female. – Analyst on Chobani target segment

Verano (Buick’s new compact car) is primarily aimed at younger professionals, in their 30s, and empty-nesters accustomed to driving luxury cars but needing less space now. – Buick’s Product Markting Director

McDonald’s said that in Europe, “severe winter weather in certain markets negatively impacted the segment’s overall February results.” In the U.S., the company cited strength in the sales of Chicken McBites, Filet-O-Fish, signature beverages and breakfast items.

Say the word “Segment”, the responses would fall into a predictable range

  1. Industry verticals
  2. Size of enterprises served (Small, Medium, Large)
  3. Geographies
  4. Age brackets
  5. Education levels
  6. Gender
  7. Everyone

The first three are the most common (and arguably the only type of) segmentation practiced by B2B companies. In all these cases segmentation is done after the fact, to analyze the revenue mix (where is the revenue coming from) and not as a driver for product and pricing strategy. B2B segmentation is more about sales team design and marketing resource allocation than about customer driven product development.

The last five are the answers of those in B2C companies. Unlike B2B, here segmentation is not an afterthought. It is used to guide product strategy and product mix, as you can see from McDonald’s example and Verano example above. Undeniably this is a better approach than what is practiced by B2B companies. But let us stop and think about it for a moment

  1. Why do highly educated, higher income people in North-East prefer Chobani?
  2. Why is Verano attractive to both young professionals in their 30s and empty nesters?
  3. Why do McDonald’s US customers prefer its breakfast items?

The Whys cannot be answered here.

All the seven types of segmentation we saw above are based on externally observable factors vs. the intrinsic need of the customers buying the product. That is the problem with both B2B and B2C segmentation schemes. Businesses confuse externally observable differences and easily measurable distinctions with segmentation and fail to ask the most important question

Why will the customer hire the product?

Stated differently by Clayton Christensen

What job is the customer hiring the product for?

Everything else – the product mix, product characteristics, customer mix and their demographics, Geography – are at best secondary and at worst red-herrings.

Answering the Why question is the only right starting point for your segmentation strategy.

If you cannot answer Why you do not have segmentation.

I started this article with quotes, here’s another from a successful business man known for his basketball victories,

Because Republicans buy sneakers too – Michael Jordan when asked why he didn’t  endorse the Democratic Senate candidate against Republican Jesse Helmes

Given data on prevalance of links in Retweets, can you tell the chances of a tweet with link will be Retweeted?

There are many articles on what increases your Tweet’s chances of getting Retweeted.

There is one article I saw that found that

  1. Among the tweets that were Retweeted, 56.7% had a link in them
  2. Among the tweets that were not Retweeted, 19% had a link in them

Given this data, can you tell the chances of your Tweet with link getting Retweeted?

State your confidential answer here and see the solution to the problem.

Note 1: I am not attesting to the validity of this data nor am I supporting any other claims made in the rest of the article.

Note 2: Keep in mind fallacy of composition – what is true of a part cannot be treated as true of the whole. As Sowell nicely put it, “If one person stands up in a stadium, she can see better than others. But if everyone stands up …”

Note3: The Fast Company article that gave us this data says in its title, “scientifically proven ways to get retweeted”. I do not get it. They did not start with a hypothesis, did not do controlled experiment, and are not looking for evidence that would falsify their claims. How can this is be scientifically proven?