Let us hunt for something interesting in this data gold mine

How many times have you heard this?

We are collecting a lot of data on our customers/transactions/sales/logs,  let us look at this goldmine to see if we can find anything interesting .

The problem with seeking something interesting is you are bound to find it. You might call the next statement tautological, but the fact is if it is interesting you are bound to find.  To a determined data-miner any interesting statistical outlier will eventually show up, then it is simply writing the hypothesis prediction.

Data mining or as some might call it data trolling is looking for  patterns from data sitting around, as opposed to deliberate decision making which requires seeking specific information for reducing uncertainty. But data can fit any number of hypotheses. However mining for a cause, we are bound to pick

  • Ones that are most convenient, like the man searching for lost key under the light.
  • Ones that are familiar, based on our past experience and our beliefs – well there are many fables about this.

The way to make informed decisions is to frame hypothesis based on the best prior knowledge we have. Know that this is just an hypothesis, not a fact and has uncertainties associated with it. Then collect specific data to refine it and reduce the uncertainty.

We will never know all the facts with certainty, but if we realize that what we know has uncertainty associated with it and there could be far more that we do not yet know, we are on the right track.

How do you make your decisions?

One thought on “Let us hunt for something interesting in this data gold mine

  1. Excellent point. I call the data trolling a Serendipity Methodology – you are looking for a needle in a hay stack and hoping to find a farmer’s daughter. It may happen, but probability is rather too low for me to take a bet.


