Your Customer Data Is Dirtier than You Think

By on in with One Comment

Wow, has it ever been a while since I got a new post up. Three-part excuse: 1) Vacation that was wayyyy offline, 2) Vacation was proceeded by a crazy week on the road for business, and 3) I’ve been working on a post that’ll turn out to be three separate posts. Stay tuned for that, as it’s my assessment/lessons learned from iterations over the course of a year on a corporate dashboard.

Immediately upon returning from vacation, I had a customer data management experience that highlighted just how bad we are as marketers at successfully tracking our customers and keeping our information on them current. Since we were gone for 10 days, we had our mail stopped. I returned home several days before the rest of my family, which meant I was on hand when the post office dropped off the bin of accumulated mail. As I sorted through it, I wound up with a surprisingly large stack of mail that was not intended for anyone in our house.

The next day — the first day that only a single day’s worth of mail came — I did a little analysis. We got seven pieces of mail (pictured at left). Of those, only three had totally correct information on the address:

  • One piece — was for the prior owner of the house; we’ve now owned the house for a year (side note: the prior owners were on the opposite end of the political spectrum from us, and, sorting through the 10 days of mail, the ratio of literature the prior owners received from “their” Presidential candidate outnumbered the literature we received from “ours” by 3:1)
  • One piece — included my wife’s middle initial…but it was not the correct middle initial
  • Two pieces — were addressed to the wife of the couple we bought our last house from…in another statefive years ago!

We’ve gotten to the point, I think, where we just accept this. The problem is that we’re starting to get too smart for our own good. Sending mail to someone who no longer lives at the address has always happened.

Having a minor data entry issue — mistyping a middle initial — is going to happen any time there is human involvement (we can trace back the fact that I still occasionally get mail for “Jim Wilson”, rather than “Tim Wilson,” to a single phone company screw-up shortly after we got married a decade-and-a-half ago).

What was really interesting, though, was what happened when companes tried to address the first issue — identifying when people moved — and generated the last issue. In my wildly-not-statistically-valid anecdote of a single day, trying to “fix” the first issue generated twice the misdirected mail.

What I’m Sure Happened

We bought a house in Austin five years ago from the couple who built it and lived in it for 25 years; in that time, they had gotten embedded into countless systems with that address. We continued to receive a steady stream of mail for them for the entire time we lived in that house, and it was not all junk mail by any means. To make things a little tougher on the senders, we shared the same last name with the prior owners. At some point while we owned that house, some of those companies undoubtedly implemented some sort of customer data integration system that, undoubtedly, hooked into some external data sources to try to sniff out when their customers moved. The problem? Much of their data was already outdated…and they didn’t have a way to identify which data that was.

So, when a “relocation” was picked up — our sale of our house in Austin and the purchase of our house in Ohio — all of the identified “residents” of the Austin house were “moved” with us.

The key takeaways from me for this — and both are really of the “keep in mind” variety — are:

  1. Your customer data is always much dirtier than most people in your company assume it to be. A key role for the data analyst is to have a more realistic understanding and be the voice of reason when it comes to requested analysis projects or the planning of marketing campaigns that rely on the data being cleaner than it actually is
  2. The only real “solution” to this issue immediately dives into 1984-like paranoia — a single (or just a handful) of universal “profiles” that the customer maintains and that other systems can reference so their data stays current. OpenID is a move in this direction…but sidesteps the paranoia by being simply an identifier (OpenID itself doesn’t store any information about you — your name, social security number, address, friends, etc.). The issue almost seems intractable — any movement towards a universal identifier equates to twice as much ratcheting up of privacy concerns

It’s not pretty, is it?

One Comment


  1. Pingback Gilligan on Data by Tim Wilson » NCOA? CASS? AKA: My New Job

Leave your Comment


« »