Answering the “Why doesn’t the data match?” Question

By on in , , with 3 Comments

Anyone who has been working with web analytics for more than a week or two has inevitably asked or been asked to explain why two different numbers that “should” match don’t:

  • Banner ad clickthroughs reported by the ad server don’t match the clickthroughs reported by the web analytics tool
  • Visits reported by one web analytics tool don’t match visits reported by another web analytics tool running in parallel
  • Site registrations reported by the web analytics tool don’t match the number or registrations reported in the CRM system
  • Ecommerce revenue reported by the web analytics tool doesn’t match that reported from the enterprise data warehouse

In most cases, the “don’t match” means +/- 10% (or maybe +/- 15%). And, seasoned analysts have been rattling off all the reasons the numbers don’t match for years. Industry guru Brian Clifton has written (and kept current) the most comprehensive of white papers on the subject. It’s 19 pages of goodness, and Clifton notes:

If you are an agency with clients asking the same accuracy questions, or an in-house marketer/analyst struggling to reconcile data sources, this accuracy whitepaper will help you move forward. Feel free to distribute to clients/stakeholders.

It can be frustrating and depressing, though, to watch the eyes of the person who insisted on the “match” explanation glaze over as we try to explain the various nuances of capturing data from the internet. After a lengthy and patient explanation, there is a pause, and then the question: “Uh-huh. But…which number is right?” I mentally flip a coin and then respond either, “Both of them” or “Neither of them” depending on how the coin lands in my head. Clifton’s paper should be required reading for any web analyst. It’s important to understand where the data is coming from and why it’s not simple and perfect. But, that level of detail is more than most marketers can (or want to) digest.

After trying to educate clients on the under-the-hood details…I almost wind up at a point where I’m asked the “Well, which number is right?” question. That leads to a two-point explanation:

  • The differences aren’t really material
  • What matters in many, many cases is more the trend and change over time of the measure — not its perfect accuracy (as Webtrends has said for years: “The trends are more important than the actual numbers. Heck, we put ‘trend’ in our company name!”

This discussion, too, can have frustrating results.

I’ve been trying a different tactic entirely of late in these situations. I can’t say it’s been a slam dunk, but it’s had some level of results. The approach is to list out a handful of familiar situations where we get discrepant measures and are not bothered by it at all, and then use those to map back to the data that is being focussed on.

Here’s my list of examples:

  • Compare your watch to your computer clock to the time on your cell phone. Do they match? The pertinent quote, most often attributed to Mark Twain, is as follows: “A man with one watch knows what time it is; a man with two watches is never quite sure.” Even going to the NIST Official U.S. Time Clock will yield results that differ from your satellite-synched cell phone. Two (or more) measures of the time that seldom match up, and with which we’re comfortable with a 5-10 minute discrepancy.

Photo courtesy of alexkerhead

  • Your bathroom scale. You know you can weigh yourself as you get out of the shower first thing in the morning, but, by the time you get dressed, get to the doctor’s office, and step on the scale there, you will have “gained” 5-10 lbs. Your clothes are now on, you’ve eaten breakfast, and it’s a totally different scale, so you accept the difference. You don’t worry about how much of the difference comes from each of the contributing factors you identify. As long as you haven’t had a 20-lb swing since your last visit to the doctor, it’s immaterial.

Photo courtesy of dno1967

  • For accountants…”revenue.” If the person with whom your speaking has a finance or accounting background, there’s a good chance they’ve been asked to provide a revenue number at some point and had to drill down into the details: bookings or billings? GAAP-recognized revenue? And, within revenue, there are scads of nuances that can alter the numbers slightly…but almost always in non-material ways.

Photo courtesy of alancleaver_2000

  • Voting (recounts). In close elections, it’s common to have a recount. If the recount re-affirms the winner from the original count, then the results is accepted and moved on from. There isn’t a grand hullabaloo about why the recount numbers differed slightly from the original account. In really close races, where several recounts occur, the numbers always come back differently. And, no one knows which one is “right.” But, once there is a convergence as to the results, that is what gets accepted.

Photo courtesy of joebeone

    That’s my list. Do you have examples that you use to explain why there’s more value in picking either number and interpreting it rather than obsessing about reconciling disparate numbers. I’m always looking for other analogies, though. Do you have any?

    3 Comments


    1. Pingback Daily Digest for May 18th at dandube.com

    2. Great post Tim on an all-too-common challenge.

      I think there is often a third answer to the “which number is right?” question. “They both are in the context of the tool and the way they collect data.”

      It is the way that we (or vendors) put those numbers out there in a way that are understandable to our audience that may be imprecise. Of course, we may have to do that so that the information is digestible.

      This type of imprecision doesn’t bother me in the least, but is often something those who don’t use the data every day have trouble overcoming.

    3. Pingback 11 Ways Humans Kill Good Analysis | Retail: Shaken Not Stirred by Kevin Ertell

    Leave your Comment


    « »