Price as the first choice attribute or last – Pricing Page Recommendation

Take a quick look at pricing pages of most web services and products. Most offer 3 or 4 versions that differ in features, usage (number of users, responses etc) and of course price. In every pricing page I visited (sampling, not comprehensive) the first attribute is always price. Some of the pricing pages use font and other highlighting to make pricing prominent.

What if price isn’t the first attribute you present to your customers?

What if your pricing page pitches the benefits of each version before it talks about price?

What if price is the last attribute for each version listed in your pricing page?

Last week I wrote about the difference between the Price leader and Price-Less leader*. The core idea was to start the conversation with your customers about all other attributes but price. When price is not prominent, you get to talk to customers about factors that are relevant to them.

A version of the concept of Price-Less leader was published in Journal of Marketing Research Dec 2009. The  article used the term “Benefits leader” instead of  “Price-Less leader” and they made a very relevant finding,

“When customers choose benefits leader (purely based on benefits and without price information) they tend to stick with that choice even when the price information is revealed. Even when faced with a higher price, they tend to stick with their choice based on benefits”

Applying these findings to pricing page, I hypothesize, when price is listed as the last attribute:

  1. More customers will pick your higher priced versions
  2. More customers will signup for your basic version (higher conversion)

This hypothesis is based on previous research on pricing but from a different context. So it is worth testing for your pricing page before you roll out. This is definitely worth adding to the A/B testing that you probably are already doing for the rest of the pages. I recommend this A/B testing despite my earlier warnings about A/B testing.

Note that I am not recommending that you do not show the price at all or show it only after customers sign up – I am recommending that you move the price to be last attribute you list under each version.

I am every interested in hearing your results. Send me a note on your results, even if you did not find statistically significant difference.

For the analytically inclined: If you do not want to do the traditional A/B testing you can use Bayesian. But I do not recommend a full blown Bayesian verification in this case.

The Long and Cross of Analytics

Anytime you see results from studies, especially from those that offer causation arguments, here is one qualifier question you should ask to determine whether or not a study is worth your time and its causation arguments  have any merit to them:

Is this a cross-sectional study or a longitudinal study?

Cross-sectional: It is the easiest one to conduct, so everyone does it. The only conclusion anyone can draw from such a study is Correlation. To find causation is simply wrong. A cross-sectional study analyzes a cross-section of the target set (be it businesses, customers etc) at a given point in time. It classifies the target set into “winners” and “losers” based on a metric the researcher chooses and looks for traits that are present in one and not the other. That is positive traits that are present in winners and lacking in losers and negative traits absent in winners and present in losers. Then it hints at or calls out the winners are winners because of their positive traits and their lack of negative traits.

Cross-sectional studies follow  the  general pattern,

“Seven/Eight/etc  habits/traits/etc of  enormously successful individuals/businesses/entrepreneurs/bloggers”

Longitudinal:  It is very hard to conduct a longitudinal study and it takes time (literally). It analyzes a target set over a period of time. It identifies winners and losers too but not based on traits but based on some action that was taken or a condition that was present at the beginning of time. Such a study follows the performance of those with the condition and those without over the period and measures the difference in their performance. Some just take point measurements at two different points of time.

Longitudinal studies follow the general pattern,

“Businesses/Entrepreneurs/Individuals who employed  factor/action/method  saw their performance increase by x%  over 7/8/9 years”

We see lot more cross-sectional studies than we see longitudinal studies.

If it is cross-sectional, I recommend you take a pass! It has all kinds of biases in flaws and very specifically confusing correlation with causation.

If it is longitudinal give it a second look.  But resist the temptation to accept the causation suggested by these studies, you still need to be aware of lurking variables and survivorship bias and most importantly beware of some that make the time flow back.

8 Flaws in A/B Split Testing

You have been using A/B split testing to improve your mail campaigns and web designs. The core idea is to randomly assign participants to group A or B and measure the resulting performance – usually in terms of conversion. Then perform statistical testing, either t-test (incorrect) or Chi-square test to see if the difference in performance between A and B is statistically significant at 95% confidence level.

There are  significant flaws with this approach:

  1. Large Samples: Use of large samples that are most likely to find statistical significance even for small differences. When using large samples (larger than 300) you lose segmentation differences.
  2. Focus on Statistical Significance: Every test tool, sample size calculator and articles are narrowly focused on achieving statistical significance, treating that as final word on the superiority of one version over.
  3. Ignoring Economic Significance: There may be statistical significance or not, but no test tool will tell you the economic significance of that for your decision making.
  4. Misleading Metrics: When tools report Version A is X% better than version B, they are simply wrong. The hypothesis testing used in A/B testing is simply one version is better than other and not by what percent.
  5. All or nothing: When the test results are inconclusive, there is nothing to learn from these tests.
  6. Discontinuous: There is no carryover of knowledge gained from previous tests. We do not apply any knowledge gained from a test in later tests.
  7. Test Everything and Test Often: The method wrests control from the decision maker in the name of “data driven”. This pushes one to suspend all prior knowledge (because these are considered hunches and intuition) and test every thing and test often, resulting in significant costs for minor improvements. Realize that the test tool makers are incentivized by your regular and excessive testing.
  8. Mistaking X implies Y is same as Y implies X: The hypothesis testing is flawed. What we test is, “how well does the data fit the hypothesis that we assumed”. But at the end of the test we state, “the hypothesis is supported by the data and is true for all future data”.

The root cause of all the mistakes is in using A/B testing for decision making. When you are deciding between two versions you are deciding which option will deliver you better returns. The uncertainty is in deciding the version. If there is no uncertainty at all, why bother?

The way to reduce uncertainty is to collect relevant information. It is profitable to do so only if the cost to collect this information is less than the expected increase in return from reducing the uncertainty.

You are not in the hypothesis testing business. You are in the business of adding value to your shareholders (that is you, your investors). To deliver value you need to make decisions in the presence of uncertainties.  With all its flaws, A/B testing is not the right solution for decision making!

So stop using A/B testing!

What do I recommend? Send me a note to read a preview of my article on “Iterative Bayesian (TM)”.

Use of Information Priors in A/B Testing

Last time I wrote about the use of prior knowledge in A/B testing there was considerable push back from the analytics community. I think I touched a nerve when I suggested the use of “how confident you were before the test” to interpret the results after the test.  While the use of  such information may sound like gut-feel and arbitrary, we must recognize that we implicitly use considerable information priors in A/B testing. The Bayesian methods I used just made the implicit assumptions explicit.

When you finally get down to test two (or three) versions with  A/B split testing, you have implicitly eliminated many other versions. You should stop and ask why you are not testing every possible combination. The answer is you applied tacit knowledge that you have, either based on your own prior testing or well established best practices and eliminated many versions that required no testing. That is the information prior!

Now let us take this one step further. Of the two versions you selected, make a call on how confident you are that one will perform better than the other. This can be based on prior knowledge about the design elements and user experience or an estimate that is biased. This should not surprise you, after all we all seem to be finding reasons why one performed better than the other after the fact.  In fact the latter scenario has hindsight bias whereas I am simply asking you to state your prior expectation of which version will perform better.

Note that I am not asking you to predict by how much, only how confident you are that there will be real (not statistically significant, but economically significant) difference between the two versions. You should write this down, before you start testing and not after (I prefer to call A/B testing as collecting data). As long as the information is obtained through methods other than this test in question, it is a valid prior. It may not be precise  but it is valid.

What we have is the application of information priors in A/B testing – valid and relevant.

Next up, I will be asking you get rid of the test for statistical significance and look at A/B testing as a mean to reduce uncertainty in decision making.

The Role of Information is to Reduce Uncertainty

Why do we need to do Marketing research, collect analytics,  perform A/B testing, and conduct experiments?

  1. To find out whether  the Highest paid person’s opinion (HiPPO) is true?
  2. To pick the clear winning option?
  3. To satisfy our ego that we drive decisions based on analytics?

The real purpose of  all these methods of data collection is to reduce uncertainty in our decision making. Decision making after all is about making choices. If there are no choices or you have already made your choice, then there is no real decision making.

If you have options but are not certain which one to go with, then there is uncertainty, or more precisely there is an unacceptable level of uncertainty. If it were acceptable, that is the expected results are not that different, then there is no decision making as well. Just flip a coin and go with it (My article from 2009.)

If the level of uncertainty is unacceptable,  that is choosing the wrong option will mean the difference between life and death or profit and loss – then it may be worth it to reduce this uncertainty provided the cost to get this information is less than the value differential.

Conversely, if the information you have or collect does nothing to reduce uncertainty in decision making then it is irrelevant regardless of how plentiful it is, how statistically significant it is, and how easy or cheap it is to collect it.

How do you make your decisions?

Why do you collect information?

Whale of a Sample Size in Statistical Testing

Australia is taking Japan to court to stop Japan from killing whales in the name of scientific testing.  The whales that are captured and killed for “research” are later sold as food. In a year, Japan harpoons and kills about 1000 whales for their research work.

What is this gotta do with  statistical significance?

We have to go all the way back to 2005, when Japan implemented what it called JARPA-2 against the wishes of International Whaling Commission. By JARPA-2, Japan increased Whale intake from its then sampling rate of 440 whales to 1000 whales.

We will implement JARPA-2 according to the schedule, because the sample size is determined in order to get statistically significant results

When everything else is held constant, increasing sample size from 440 to 1000 will increase statistical significance because of the way the standard error SE is computed. SE that measures sampling precision goes from  σ/√440 to σ/√1000 a lower number that almost guarantees statistical significance. (see reference)

Under the cloak of statistical significance more whales are being sampled without regard to the economic significance and ecological significance.

Consider this in the context of your A/B testing. Yes even minor differences will appear statistically significant by the magic of large samples. But statistical significance means is not sufficient, we need to ask do these differences have economical significance? Should we chase these tiny differences and lose the opportunity to get the rest of the 97% who are not converting?