Dim Light and Ambient Noise Increase Creativity … Not among skeptics

Here are two studies that got lots of social media mileage this week

  1. Dimming lights enhances creativity
  2. Right amount of ambient noise increases creativity

Quoting from the light study, Salon writes

during that all-important phase when you’re grasping for ideas, dim light appears to be a catalyst for creativity. “Darkness changes a room’s visual message,” the researchers explain, and one such message seems to be it’s safe to explore widely and let your imagination run free.

And Salon goes on to make general prescription for all of us

So if you’re struggling to finish that screenplay or come up with the next must-have app, you might try illuminating your workspace with one bare bulb of minimal wattage.s

Quoting from the ambient noise study,  99u writes

 moderate level of background noise creates just enough distraction to break people out of their patterns of thinking and nudge them to let their imagination wander, while still keeping them from losing their focus on the project all together. This distracted focus helps enhance your creativity.

And their recommendations are to roam about in hotel lobbies and coffee shop. If you cannot roam about they have Apps for that (with names like Coffitivity)

Before you give a second look to these studies, despite the fact that these are published in peer reviewed journals and found statistically significant difference, stop and ask some critical questions.

  1. Are the effects additive? So would it work better if we roamed about in dim coffee shops?
  2. The ambient noise study reports difference between 50dB and 70dB. How many other pairs did they test and reject before reporting on a pair that had statistically significant difference? (See green jelly beans cause acne). Only those sound levels that showed difference get reported and the rest get filed in the round filing cabinet. And journals like publishing only those studies that find statistically significant difference. Remember that when motivated researchers keep looking for something interesting they are bound to find it.
  3. What is the measure of creativity? Is that correct and relevant? The ambient noise study used Remote Associates Test while the dim light study used Creative Insights Problem. Why two different metrics by the two studies? Did they try all kinds of measures and picked the one that showed statistically significant difference? If you repeated the Dim Light experiment with Remote Associates Test and Ambient Noise experiment with Creative Insights Problem will the results hold?
  4. Let us say all these are answered in favor of the tests. Does that mean the results translate into real world that has far too many variables? How many hidden hypothesis did the two researchers took for granted in coming up with their results? How many of those will be violated in the real world?
  5. Does statistical significance mean economic significance? What is the economic impact of any perceived difference?
  6. Do you have means to measure creativity of your team that are based on real life results and  do not involve administering Remote Associates Test or Creative Insights Problem? Performance in tests like these is rarely an indication of actual job performance, as Google found out about brainteasers in job interviews.
  7. Don’t forget opportunity cost. You see here two recipes for improving creativity, you will find more if you look for them. All peer reviewed, I bet. Which one can you afford to pick? Or could you be investing your time and resources in something else instead of dimming lights and creating coffee shop noise?
  8. Even if you go on to dim lights or create coffee shop noise there is always Hawthorne effect.  Either your team will react to it, tell you that they are seeing improvement or you will convince yourself you are seeing improvement because you do not want to be wrong about your decision to pipe coffee shop noise through office speakers.

Finally it behooves us to keep in mind the great academic fraud of Diederik Stapel.  I am not saying anything remotely close to what Stapel did happened in these two studies. But I am urging you to start with high skepticism and place the further burden of proof of applications on the researchers and on those writing derivative works based on the research.

Your opportunity cost of internalizing these studies and acting on them is far too high. You don’t want to bump into other Salon readers wandering in dim and noisy hotel lobbies.

If you want an excuse to get your work done in a coffee shop, do it. But do not try to justify it with scientific reason.

Finally, if you are allowed to believe in only one of the two studies which one will you believe?

The Hippo Problem in Business Advice

Which animal do you think Hippos are related to?

My guess was elephant because in my native language Hippo was called water-elephant. Hippos are big, grey, herbivores and spend most time in water so they were related to elephants – so thought my culture.

In the modern world early naturalists thought Hippos must be related to pigs. I guess they used other physical cues like looks and teeth and that they both wallow in mud and decided on pig relation.

Not until other scientists looked at relevant evidence, not just what is convenient, available, fits a story, and supports one’s pre-conceived notion, did we find out the real relation.

Take hippos, for example. Early naturalists thought hippos must be related to pigs. After all they look somewhat alike and have similar teeth. But fossils and genetic studies showed that hippos’ closest living relatives are actually dolphins and whales. (NPR)

No one who looks only at the superficial symptoms and what is overt could have come to this conclusion.

We have a Hippo problem in business advice. More like Hippo crisis with bigger ramifications than not getting Hippo-Dolphin connection right.

Look at successful businesses. Look at their seven traits. If only you had them your business would be successful too.
Look at this magical number Net Enchantment Scores of highly successful companies. If only you get your score to that level you would be insanely profitable too.
Look at Brand V’s successful social media campaign. You better get on twitter and start conversations with your customers.

Hundreds of management/marketing/business gurus, thousands of books,  and hundreds of thousands of articles bombarding us with advice on how we should run a business based only what they saw as a relation between Hippo and PIg.

And millions of fans who have suspended skepticism to embrace a Guru’s preaching and spread it around, taking solace in the numbers. After all millions of people who read the blog articles and pass them around can’t be wrong and we are not alone in following Guru’s footsteps.

The Hippo problem in business advice is not just the fault of self-confident Gurus with not an iota of self doubt pushing their snake-oil with no repercussions. We who accept and embrace these prescriptions without asking difficult questions are bigger part of the problem.  Questions like,

  1.  Is my business like the Hippo Guru is talking about?
  2. What evidence will cause Guru’s advice to be wrong? Sure I run cupcake store, does that mean my customers want engagement and not just cupcakes?
  3. What is the opportunity cost of following in Guru’s footsteps and getting it wrong? Should I adopt razor-blade model because Guru says that is the future?

In the recent past, before the explosion of self-anointed Gurus, we had a framework for making decisions. We did not make decisions because someone else did it that way, we looked our goals, our customers, market dynamics, marketing channels, sales channels and our ability to compete.

Now everyone is a Business Guru. There is no need for looking for evidence, the Gurus know. They know just from the few anecdotes they have seen – be it a farmers market vendor, Grateful Dead or Harry Potter movie. They have their prescriptions for us on how to run a business, do marketing and price products.

The problem is not going to go away because Gurus get self-realization (intended). I think one can’t be a Guru peddling snake-oil prescriptions unless one loses all self-doubt and strongly believes in what they are selling. These people are selling to a segment with need for magical prescriptions – like engaging in social media, telling stories or to be remarkable.

Until we the consumers of business advice stop worshipping Gurus and seek relevant evidence the Hippo problem in business advice is going away.

Are you ready to stop following your Guru and start asking tough questions?

Can you add my one question to your survey?

No sooner you let it be known, mostly inadvertently, that you are about to send out a survey to customers than starts incessant requests (and commands) from your co-workers (and bosses) to add just one more question to it. Just one more question they have been dying to find the answer for but have not gotten around to do a survey or anything else to find the answer for.

Just one question right? What harm can it do? Sure you are not opening the floodgates and adding everyone’s question, just one question to satisfy the HiPPO?

May be I am unfair to all our colleagues. It is possible it is not them asking to add one more question, it is usually us who is tempted to add just one more question to the survey we are about to send out. If survey takers are already answering a few it can’t be that bad for them to answer one more?

The answer is yes of course it can be really bad. Resist any arm-twisting, bribing and your own temptation to add that one extra question to a carefully constructed survey. That is I am assuming you did carefully construct the survey, if not sure add them all, the answers are meaningless and in-actionable anyways.

To define what carefully constructed survey means we need to ask, “What decision are you trying to make with the data you will collect?”.

survey-processIf you do not have decisions to make, if you won’t do anything different based on the data collected or if you are committed to do whatever you are doing now and only collecting data to satisfy the itch then you are doing it absolutely wrong. And in that case yes please add that extra question from your boss for some brownie points.

So you do have decisions to make and made sure the data you seek is not available through any other channels. Then you need to develop a few hypotheses about the decision. You do that by doing the background exploratory research including customer one-on-one interviews, social media search analysis and if possible focus groups. Yes we are actually paid to make better hypothesis so you should take this step seriously.

For example your decision is how to price a software offering and your hypotheses is about value perception of certain key features and consumption models.

Once you develop a minimal set of well defined hypotheses to test, you design the survey to collect data to test those hypotheses.  Every question in your survey must serve to test one or more of the hypotheses. On the flip side you may not be able to test all your hypotheses in one survey and that is okay. But if there is a question that does not serve to test any of the hypotheses then it does not belong in that survey.Slide2

The last step is deciding the relevant target mailing list you want to send this survey to. After all there is no point is asking the right questions to wrong people.

Now you can see what adding that one extra question from your colleague does to your survey. It did not come from your decision process, does not help with your hypotheses, and most likely not relevant to the sample set you are using.

Estimate the amount of cash in Brad Pitt’s wallet

This isn’t original, I have read this somewhere nevertheless this serves to explain estimations, confidence interval and precision.

Suppose if I asked you to estimate how much cash does Brad Pitt carries in his wallet what would be your guess?

It is hard to guess it right. It is hard because I asked you to give a single number and given that there are many possibilities (even with just whole numbers) your answer is likely going to be wrong.  With you your guess of single value you cannot tell how confident you are about the estimate.  That is the problem with making single value estimate – be it estimating cash in a wallet or expected revenue impact of a marketing campaign. Don’t give a single number and don’t trust anyone giving you a single number.

What if we asked 1000 random people on the street to find all their answer and averaged it out. Would that give the right answer? Isn’t that wisdom of the crowd? Well it won’t be the right answer. But if you plot the answers and number of people who said each value on a graph (Histogram) you likely will see a Normal curve.  The distribution will tell us the low and high value of the cash in Brad Pitt’s wallet and also the chance that it will be outside this range.

Suppose 95% of the responses fall between $10 and $978 then we could say, “we are 95% confident Brad has $10 to $978”. Well we could be wrong in saying “we are 95% confident” if we received homogenous answers and hence got too narrow a range.

Instead of asking 1000 people what if I asked you not for a single value but to give your 95% confidence interval for the amount of cash in his wallet, what would be your answer?

It is the equivalent of asking 1000 different people. And since I asked for 95% confidence you should give a range so that there is only 5% chance the real answer is outside this range. I am not asking for precision (so don’t try to give a narrow range) but a high confidence level (so you should go wide).

Since you do not know anything about cash carrying habits of Hollywood stars you should trade-off precision for confidence. You could answer,  $0 and  $100,000.  That is acceptable but too wide a range to be of real use in cases other than estimating Brad Pitt’s wallet. However you can apply your knowledge about stuffing bills in a wallet and give a better range like $10 and $2000.

That is what you would do when measuring outcomes of events when there are many unknowns. You break down the BIG unknown into a set of component unknowns and for each smaller unknown you make an estimate at a given confidence level.  Stating a range with confidence level (confidence interval) based on application of prior knowledge is far better and usable than a single number that we are asked to trust.

“The marketing campaign will increase sales by 45%”

“We are 90% confident the marketing campaign will result in sales increase in the range of 20% to 43%”

Which one of these two is more trustworthy?


It is likely better to speak in absolutes

You read only interesting findings because only those get published, get written about and popularized in social media. Experiments that find no statistically significant difference don’t leave the filing cabinets of researchers because no one wants to read a story where nothing happens. This is such an experiment, where there was not enough evidence to reject the null hypothesis.

Let us start at the beginning. This experiment is about people’s perception of a person’s competence based on whether the person speaks in absolutes with no room for alternatives or whether the person speaks in terms of likelihood, accounting for alternative explanations.

There are several examples of those who speak in absolutes with no self-doubt. Read any CEO interview (enterprise or startup), management guru’s book or Seth Godin’s blog. Examples are,

“Revenue grew because of our marketing”
“Sales fell because of Europe”
“Groupon works, it really works”

An example of speaking in terms of likelihood comes from Nobel laureates in economics,

“Answers to questions like that require careful thinking and a lot of data analysis. The answers are not likely to be simple.”

Hypotheses: You do start with hypotheses before any data analysis don’t you?

Here are the hypotheses I had about speaking in absolutes/likelihoods and perception of competence.

H1: Business leaders are judged to be more competent when they speak in absolutes. Conversely, using terms like “likely” may be perceived as wishy-washy and hence signal incompetence.

H2: Scientists are judged to be more competent when they use likelihoods and avoid absolutes. (Because scientists are expected to think about all aspects and anyone who zones in on one factor must not know how to think about acenarios)

Of course the null hypothesis is there is no statistically significant difference in perception of competence based on whether the subject in question speaks in absolutes or in likelihoods.

Experiment Design: So I designed a simple 2X2 experiment, using SurveyGizmo. You can see the four groups, Company Executive and Scientist as one dimension, Absolutes and Likelihoods on the other. I designed a set of 4 statements with these combinations. When people clicked on the survey they were randomly shown one of the four options.

Here is one of the four statements

This was a very generic statement meant to speak about results and what could have caused it. I avoided specific statements because people’s domain knowledge and preconceived notions come into play. For example, if I had used a statement about lean startup or social media it would have resulted in significant bias in people’s answers.

Based on just one statement, without context, people were asked to rate the competence of the person. Some saw this about Scientists, some about a Company Executive.

Note that an alternate design is to show both Absolute and Likelihood statement and ask the respondents to pick the one they believe to be more competent. I believe that would lead to experimental bias as people may start to interpret the difference between two statements.

Results:  I collected  130 responses, almost evenly split between four groups and did t-test on the mean rating between the groups (Scientists: Absolute/Likelihood, Executive: Absolute/Likelihood, Absolute: Executive/Scientist, Likelihood: Executive/Scientist). And you likely guessed the results from my opening statements.

There is not enough evidence to reject the null hypothesis in all the different tests. That means and difference we see in competence perception of those speaking in absolutes and likelihoods is just random.

What does this mean to you?

Speaking in absolutes, a desired trait that leaders cultivate to be seen as competent and decisive leader, has no positive effect. Including uncertainties does not hurt either.

So go right ahead and present simplistic one size fits all solutions without self-doubt.  After all stopping to think about alternatives and uncertainties only takes time and hurts ones brain with no positive effect on the audience.

Caveats: While competence is not an issue I believe trust perception could be different. That requires another experiment.

4 Ways You Can Put Google Customer Surveys To Work Today

As I previously wrote, Google Customer Surveys is a true business model innovation. It helps publishers unlock value from their digital assets and enables market researchers reach new audience they otherwise would not have found. I expressed my reservations on their positioning in my previous article

But I do not get what they mean by, “look for correlations between questions” and definitely don’t get, “pull out hypotheses”. It is us, the decision makers,who make the hypothesis in the hypothesis testing. We are paid to make better hypotheses that are worthy of testing.

Since I wrote that article, their Product Manager emailed to say they removed their statement on, “pull out hypothesis”.

This is a limited tool with ability to ask just one question and no way to ensure that the same user will answer multiple questions for doing customer level analysis.

There is one more item which is their minimum sample size. You cannot order anything less than 1000 samples.

Despite these reservations I see Google Customer Surveys as an effective tool for product/brand managers, researchers and small businesses for these purposes:

1. Aided Recall:  Present them a choice of different brands ask them how many of these they recognize.
When you are trying to get very quick and high level data on customer awareness or preference of your brand, this is a great tool. The results are especially actionable when you get extreme results like no one knows about you.
If you are trying to find which brand they recognize the most then you can do that as well with different question type. However, due to its question format limitation, Google Customer Surveys cannot help with Unaided recall.

2. Finding Consideration Set: Present them a choice of different brands and ask them how many will they consider buying for solving a particular need. This is similar to Aided Recall but the question is more focused. You are not simply asking about awareness but whether your brand makes it into their consideration set.

3. Brand Association: Present them an image or a statement and ask them to pick a tag-line or brand they believe goes with it. Another variation of this question is asking them to associate your brand with an unrelated field. A typical example is, “if our brand were a movie actor, who will it be”.

Ability to use images is a very powerful feature. It creates many different opportunities. For example for testing your advertising copy or the images you use in your collateral. It is better to poll your audience whether the image you used looks more like a bean bag or boxing glove before you launch your expensive advertising campaign.

4. Consumer Behavior Research: This is a whole class of hypothesis testing you can do with Google Customer Surveys. While it is not a tool for A/B split testing, you can use it test your hypothesis on customer preferences or their susceptibility to anchors and other nudges. Before collecting results you need to specify a reasonable hypothesis that is worth testing. When you collect data you can test for statistical significance using Chi-square test to validate your hypothesis. Do keep in mind that sometimes data can fit more than one hypotheses

There is however a big limitation because of the length of questions you can ask (as you see in the third option in the image on the left).

There you have it. A tool with limitations but is effective for specific areas. It opens up new ways to collect data and test when none existed before.

A corollary for this post would be cases where you should not use this tool. That includes finding price customers are willing to pay or asking them about how important a single feature is. You have to wait for another post for the reasons.