<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Gilligan on Data by Tim Wilson &#187; Data Management</title>
	<atom:link href="http://www.gilliganondata.com/index.php/category/data-management/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.gilliganondata.com</link>
	<description>Thoughts, musings, and, hopefully, not too many redundancies on the world of business data. If you missed the irony in the previous sentence, you may struggle with my writing style.</description>
	<lastBuildDate>Thu, 09 Feb 2012 11:00:34 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<image>
  <link>http://www.gilliganondata.com</link>
  <url>http://www.gilliganondata.com/favicon.ico</url>
  <title>Gilligan on Data by Tim Wilson</title>
</image>
		<item>
		<title>Privacy: It&#8217;s a 2.5-Dimensional Issue</title>
		<link>http://www.gilliganondata.com/index.php/2011/04/26/privacy-its-a-2-5-dimensional-issue/</link>
		<comments>http://www.gilliganondata.com/index.php/2011/04/26/privacy-its-a-2-5-dimensional-issue/#comments</comments>
		<pubDate>Tue, 26 Apr 2011 16:37:30 +0000</pubDate>
		<dc:creator>Tim Wilson</dc:creator>
				<category><![CDATA[Data Management]]></category>
		<category><![CDATA[Social Media]]></category>
		<category><![CDATA[Web Analytics]]></category>
		<category><![CDATA[privacy]]></category>

		<guid isPermaLink="false">http://www.gilliganondata.com/?p=1270</guid>
		<description><![CDATA[I&#8217;m keeping the voting open for another week or so on my &#8220;choose a new profile picture&#8221; poll, so if you haven&#8217;t voted yet, please click over and do so. There&#8217;s a charitable donation (by me!) involved! &#8220;Privacy&#8221; is a hot topic in the world of marketing analytics, driven primarily …]]></description>
			<content:encoded><![CDATA[<p><em>I&#8217;m keeping the voting open for another week or so on my <a title="It’s Time for a Change, and I Need Your Help" href="http://www.gilliganondata.com/index.php/2011/04/19/its-time-for-a-change-and-i-need-your-help/" target="_blank">&#8220;choose a new profile picture&#8221; poll</a>, so if you haven&#8217;t voted yet, please click over and do so. There&#8217;s a charitable donation (by me!) involved!</em></p>
<p>&#8220;Privacy&#8221; is a hot topic in the world of marketing analytics, driven primarily by shifting consumer (and, in turn, regulatory) sentiment on the subject. That shifting sentiment, I think, is largely being driven by the increasing integration of social media into our lives and our online behavior.</p>
<p>The WAA stepped up and put together a <a title="WAA Code of Ethics" href="http://www.webanalyticsassociation.org/?page=codeofethics" target="_blank">Code of Ethics</a> a few months ago, and privacy is going to be a recurring topic at eMetrics and other conferences for the foreseeable future. Following the San Francisco eMetrics conference, <a title="@immeria - Stephane Hamel" href="http://twitter.com/immeria" target="_blank">Stéphane Hamel</a> put together <a title="Web analytics ethic trivia" href="http://blog.immeria.net/2011/03/web-analytics-ethic-trivia.html" target="_blank">three scenarios</a> and asked the <a title="#measure on Twitter" href="http://search.twitter.com/search?q=%23measure" target="_blank">#measure community</a> to vote as to the ethics and allowability of each situation. He then <a title="Web analytics ethic: from theory to practice" href="http://blog.immeria.net/2011/03/web-analytics-ethic-from-theory-to.html" target="_blank">revealed the results and added his own thoughts</a>. Towards the end of that second post, Stéphane noted that he was disappointed by the lack of interest in the exercise, given the generally accepted importance of the topic.</p>
<p><a title="Crepuscular Light - Emer Kirrane" href="http://www.emerkirrane.com/" target="_blank">Emer Kirrane</a> responded in the comments:</p>
<blockquote><p>It&#8217;s interesting that there seems to be a correlation between legality and ethics in the minds of your respondents. To me, the Code of Ethics is there as a flag against practices that are deemed unethical by the community, rather than deemed unethical by law.</p></blockquote>
<p>Stéphane&#8217;s concern and Emer&#8217;s response have been bouncing around in my brain for several weeks. My conclusion: &#8220;ethics vs. legality&#8221; is going to continue to give us fits.</p>
<p>I realize this isn&#8217;t the first time that &#8220;ethics&#8221; and &#8220;the law&#8221; haven&#8217;t perfectly aligned (they almost never do, actually, even though that, from a purist point of view, is the goal), but bear with me &#8212; it&#8217;s worth using that lens to explore the issue and outline the challenges we&#8217;re going to have to deal with. These are two very different dimensions of the privacy debate, and one of them is in flux on several fronts.</p>
<h3>Why 2.5 Dimensions?</h3>
<p>Obviously, there is a legal/regulatory dimension, and there is an ethical dimension. But, really, the legal/regulatory dimension is <em>heavily </em>driven and influenced by consumer perceptions and fears. I actually <a title="Fear vs. Convenience" href="http://www.gilliganondata.com/index.php/2009/01/29/fear-vs-convenience-the-customer-data-conundrum/" target="_blank">wrote some thoughts on that</a> a couple of years ago. With high-profile Facebook snafus and high-profile media outlets reporting on cookies and cross-site tracking, politicians have found an issue that their constituents care about (or can be prodded to care about). So, in a sense, the legal/regulatory dimension has some added &#8220;oomph&#8221; of consumer concerns behind it; I&#8217;m calling that &#8220;consumer perspective&#8221; another half a dimension.</p>
<p>It&#8217;s possible that &#8220;consumer perception&#8221; should be a third dimension in and of itself. But, oh boy, that <em>would</em> make for some hairy sketching in the remainder of this post. I&#8217;m pretty sure I&#8217;m not just punting, though &#8212; the will of the consumer when it comes to something like privacy does generally get manifested through some form of government regulation.</p>
<h3>Start with the Basics</h3>
<p>Two dimensions: legal and ethical. We can look at them like this:</p>
<p><img class="aligncenter size-full wp-image-1273" title="privacy_1" src="http://www.gilliganondata.com/wp-content/uploads/2011/04/privacy_1.png" alt="" width="488" height="414" /></p>
<p>Various practices raise privacy questions. In theory, we can plot each of them on this (conceptual) grid &#8212; there are more than shown here, but I&#8217;m just laying out the basic idea of the framework:</p>
<p><img class="aligncenter size-full wp-image-1274" title="privacy_2" src="http://www.gilliganondata.com/wp-content/uploads/2011/04/privacy_2.png" alt="" width="488" height="414" /></p>
<h3>In Theory, We&#8217;d Have Harmonious Dimensions</h3>
<p>If life was simple, we would have perfect clarity for each dimension, and perfect alignment <em>between</em> dimensions:</p>
<p><img class="aligncenter size-full wp-image-1275" title="privacy_3" src="http://www.gilliganondata.com/wp-content/uploads/2011/04/privacy_3.png" alt="" width="488" height="414" /></p>
<p>Notice the shaded quadrants at top left and bottom right &#8212; there would be <em>no practices</em> that were ethical but not legal, nor would there be any practices that were legal but unethical.</p>
<h3>Alas! Privacy is Rife with Gray Areas!</h3>
<p>Reality is more like this &#8212; gray areas rather than hard lines along both dimensions:</p>
<p><img class="aligncenter size-full wp-image-1276" title="privacy_4" src="http://www.gilliganondata.com/wp-content/uploads/2011/04/privacy_4.png" alt="" width="488" height="414" /></p>
<p>Ugh. Things get messy. There are more activities that are questionable &#8212; they may or may not be legal and/or they may or may not be ethical! Argh!</p>
<h3>But Wait! There&#8217;s More!</h3>
<p>Ever since the web went mainstream, it&#8217;s been a more global medium than anything that came before. And, we&#8217;ve all run into cases and concerns that our standard web analytics implementation runs afoul of the law in some country somewhere. This grid illustrates that wrinkle, too &#8212; the legal/regulatory gray areas live in different places depending on the country (only the U.S. and the E.U. are shown here &#8212; it&#8217;s an illustrative diagram, people! Not a comprehensive one!):</p>
<p><a href="http://www.gilliganondata.com/wp-content/uploads/2011/04/privacy_5a.png"><img class="aligncenter size-full wp-image-1288" title="privacy_5a" src="http://www.gilliganondata.com/wp-content/uploads/2011/04/privacy_5a.png" alt="" width="486" height="418" /></a></p>
<p>And the big blue arrow shows where pressure is being applied (back to that half-dimension of consumer fears mentioned at the beginning of this post). It&#8217;s a little counterintuitive that the arrow is pointing upward, isn&#8217;t it? How could it be that things are trending <em>towards</em> &#8220;allowed?&#8221; They&#8217;re not. Rather, the &#8220;interpretation zone&#8221; is moving upward &#8212; practices that used to be &#8220;clearly allowed&#8221; aren&#8217;t inherently changing what they are, but those practices are moving from &#8220;in the clear&#8221; towards the gray area.</p>
<h3>Helpful?</h3>
<p>This was definitely one of those situations where, when I initially had a rough picture in my mind that would represent these two dimensions, it was simple and clear. It was only as I put pen to paper to sketch it out that it turned out to be tricky. Shortly after I finished writing this post (but, obviously, before I published it&#8230;as I&#8217;m adding this comment at the end), <a title="@usujason" href="http://twitter.com/usujason" target="_blank">Jason Thompson</a> made a really good case as to what is (misguidedly) <a title="Don't Track Me! - Jason Thompson" href="http://emptymind.org/dont-track-me-dude/" target="_blank">driving the legal dimension out of alignment with the ethical perspective</a>. That reminded me that I keep meaning to go back and re-read the last chapter (chapter 9?) of <a title="Social Media Metrics by Jim Sterne" href="http://www.amazon.com/gp/product/0470583789/ref=as_li_ss_tl?ie=UTF8&amp;tag=gillondata-20&amp;linkCode=as2&amp;camp=217145&amp;creative=399349&amp;creativeASIN=0470583789" target="_blank">Jim Sterne&#8217;s Social Media Metrics book</a>, as I recall that it was an intriguing non-sequitur that considered turning the entire &#8220;tracking&#8221; model on its head. Food for thought for another post, that.</p>
<p>What do you think? Is this an effective representation of the shifting privacy landscape we&#8217;re dealing with? What does it miss?<strong>Similar Posts:</strong>
<ul class="similar-posts">
<li><a href="http://www.gilliganondata.com/index.php/2008/01/08/data-portability-vs-privacy/" rel="bookmark" title="January 8, 2008">Data Portability vs. Privacy</a></li>
</ul>
<p><!-- Similar Posts took 23.722 ms --></p>
<hr />
<p><small>&copy; Tim for <a href="http://www.gilliganondata.com">Gilligan on Data by Tim Wilson</a>, 2011. |
<a href="http://www.gilliganondata.com/index.php/2011/04/26/privacy-its-a-2-5-dimensional-issue/">Permalink</a> |
<a href="http://www.gilliganondata.com/index.php/2011/04/26/privacy-its-a-2-5-dimensional-issue/#comments">3 comments</a> |
Add to
<a href="http://del.icio.us/post?url=http://www.gilliganondata.com/index.php/2011/04/26/privacy-its-a-2-5-dimensional-issue/&amp;title=Privacy: It&#8217;s a 2.5-Dimensional Issue">del.icio.us</a>
<br/>
Post tags: <a href="http://www.gilliganondata.com/index.php/tag/privacy/" rel="tag">privacy</a><br/>
</small></p>
<p><small>Feed enhanced by <a href='http://planetozh.com/blog/my-projects/wordpress-plugin-better-feed-rss/'>Better Feed</a> from  <a href='http://planetozh.com/blog/'>Ozh</a></small></p>
]]></content:encoded>
			<wfw:commentRss>http://www.gilliganondata.com/index.php/2011/04/26/privacy-its-a-2-5-dimensional-issue/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Answering the &#8220;Why doesn&#8217;t the data match?&#8221; Question</title>
		<link>http://www.gilliganondata.com/index.php/2010/05/18/answering-the-why-doesnt-the-data-match-question/</link>
		<comments>http://www.gilliganondata.com/index.php/2010/05/18/answering-the-why-doesnt-the-data-match-question/#comments</comments>
		<pubDate>Tue, 18 May 2010 17:05:51 +0000</pubDate>
		<dc:creator>Tim Wilson</dc:creator>
				<category><![CDATA[Analysis]]></category>
		<category><![CDATA[Data Management]]></category>
		<category><![CDATA[Metrics]]></category>
		<category><![CDATA[match]]></category>

		<guid isPermaLink="false">http://www.gilliganondata.com/?p=744</guid>
		<description><![CDATA[Anyone who has been working with web analytics for more than a week or two has inevitably asked or been asked to explain why two different numbers that &#8220;should&#8221; match don&#8217;t: Banner ad clickthroughs reported by the ad server don&#8217;t match the clickthroughs reported by the web analytics tool Visits …]]></description>
			<content:encoded><![CDATA[<p>Anyone who has been working with web analytics for more than a week or two has inevitably asked or been asked to explain why two different numbers that &#8220;should&#8221; match don&#8217;t:</p>
<ul style="text-align: center;">
<li style="text-align: left;">Banner ad clickthroughs reported by the ad server don&#8217;t match the clickthroughs reported by the web analytics tool</li>
<li style="text-align: left;">Visits reported by one web analytics tool don&#8217;t match visits reported by another web analytics tool running in parallel</li>
<li style="text-align: left;">Site registrations reported by the web analytics tool don&#8217;t match the number or registrations reported in the CRM system</li>
<li style="text-align: left;">Ecommerce revenue reported by the web analytics tool doesn&#8217;t match that reported from the enterprise data warehouse</li>
</ul>
<p style="text-align: left;">In most cases, the &#8220;don&#8217;t match&#8221; means +/- 10% (or maybe +/- 15%). And, seasoned analysts have been rattling off all the reasons the numbers don&#8217;t match for years. Industry guru <a title="Brian Clifton" href="http://twitter.com/brianclifton" target="_blank">Brian Clifton</a> has written (and kept current) the most <a title="Understanding Web Analytics Accuracy" href="http://www.advanced-web-metrics.com/blog/2010/04/23/understanding-web-analytics-accuracy/" target="_blank">comprehensive of white papers on the subject</a>. It&#8217;s 19 pages of goodness, and Clifton notes:</p>
<blockquote style="text-align: center;">
<p style="text-align: left;">If you are an agency with clients asking the same accuracy questions, or an in-house marketer/analyst struggling to reconcile data sources, this accuracy whitepaper will help you move forward. Feel free to distribute to clients/stakeholders.</p>
</blockquote>
<p style="text-align: left;">It can be frustrating and depressing, though, to watch the eyes of the person who insisted on the &#8220;match&#8221; explanation glaze over as we try to explain the various nuances of capturing data from the internet. After a lengthy and patient explanation, there is a pause, and then the question: &#8220;Uh-huh. But&#8230;which number is right?&#8221; I mentally flip a coin and then respond either, &#8220;Both of them&#8221; or &#8220;Neither of them&#8221; depending on how the coin lands in my head. Clifton&#8217;s paper should be required reading for any web analyst. It&#8217;s important to understand where the data is coming from and why it&#8217;s not simple and perfect. But, that level of detail is more than most marketers can (or want to) digest.</p>
<p style="text-align: left;">After trying to educate clients on the under-the-hood details&#8230;I almost wind up at a point where I&#8217;m asked the &#8220;Well, which number is right?&#8221; question. <em>That</em> leads to a two-point explanation:</p>
<ul style="text-align: center;">
<li style="text-align: left;">The differences aren&#8217;t really material</li>
<li style="text-align: left;">What matters in many, many cases is more the trend and change over time of the measure &#8212; not its perfect accuracy (as <a title="Webtrends" href="http://webtrends.com" target="_blank">Webtrends</a> has said for years: &#8220;The trends are more important than the actual numbers. Heck, we put &#8216;trend&#8217; in our company <em>name</em>!&#8221;</li>
</ul>
<p style="text-align: left;">This discussion, too, can have frustrating results.</p>
<p style="text-align: left;">I&#8217;ve been trying a different tactic entirely of late in these situations. I can&#8217;t say it&#8217;s been a slam dunk, but it&#8217;s had some level of results. The approach is to list out a handful of familiar situations where we get discrepant measures and are not bothered by it at all, and then use those to map back to the data that is being focussed on.</p>
<p style="text-align: left;">Here&#8217;s my list of examples:</p>
<ul style="text-align: center;">
<li style="text-align: left;"><strong>Compare your watch</strong> to your computer clock to the time on your cell phone. Do they match? The pertinent quote, most often attributed to Mark Twain, is as follows: &#8220;A man with one watch knows what time it is; a man with two watches is never quite sure.&#8221; Even going to the <a title="NIST Time Clock" href="http://www.time.gov/" target="_blank">NIST Official U.S. Time Clock </a> will yield results that differ from your satellite-synched cell phone. Two (or more) measures of the time that seldom match up, and with which we&#8217;re comfortable with a 5-10 minute discrepancy.</li>
</ul>
<p style="text-align: center;">
<a href="http://www.flickr.com/photos/alexkerhead/3694491125/"><img class="aligncenter size-full wp-image-747" title="watches_alexkerhead" src="http://www.gilliganondata.com/wp-content/uploads/2010/05/watches_alexkerhead.jpg" alt="" width="500" height="360" /></a><em>Photo courtesy of <a title="alexkerhead" href="http://www.flickr.com/photos/alexkerhead/" target="_blank">alexkerhead</a></em></p>
<ul style="text-align: center;">
<li style="text-align: left;"><strong>Your bathroom scale.</strong> You know you can weigh yourself as you get out of the shower first thing in the morning, but, by the time you get dressed, get to the doctor&#8217;s office, and step on the scale there, you will have &#8220;gained&#8221; 5-10 lbs. Your clothes are now on, you&#8217;ve eaten breakfast, and it&#8217;s a totally different scale, so you accept the difference. You don&#8217;t worry about how much of the difference comes from each of the contributing factors you identify. As long as you haven&#8217;t had a 20-lb swing since your last visit to the doctor, it&#8217;s immaterial.</li>
</ul>
<p style="text-align: center;"><a href="http://www.flickr.com/photos/dno1967/4528398768/"><img class="aligncenter size-full wp-image-748" title="scale_dno1967" src="http://www.gilliganondata.com/wp-content/uploads/2010/05/scale_dno1967.jpg" alt="" width="500" height="281" /></a><em>Photo courtesy of <a title="dno1967" href="http://www.flickr.com/photos/dno1967/" target="_blank">dno1967</a></em></p>
<ul style="text-align: center;">
<li style="text-align: left;"><strong>For accountants&#8230;&#8221;revenue.&#8221;</strong> If the person with whom your speaking has a finance or accounting background, there&#8217;s a good chance they&#8217;ve been asked to provide a revenue number at some point and had to drill down into the details: bookings or billings? GAAP-recognized revenue? And, within revenue, there are scads of nuances that can alter the numbers slightly&#8230;but almost always in non-material ways.</li>
</ul>
<p style="text-align: center;"><a href="http://www.flickr.com/photos/alancleaver/2750890246/"><img class="aligncenter size-full wp-image-749" title="finance_alancleaver" src="http://www.gilliganondata.com/wp-content/uploads/2010/05/finance_alancleaver.jpg" alt="" width="500" height="335" /></a><em>Photo courtesy of </em><em><a title="alancleaver_2000" href="http://www.flickr.com/photos/alancleaver/" target="_blank">alancleaver_2000</a></em></p>
<ul style="text-align: center;">
<li style="text-align: left;"><strong>Voting (recounts).</strong> In close elections, it&#8217;s common to have a recount. If the recount re-affirms the winner from the original count, then the results is accepted and moved on from. There isn&#8217;t a grand hullabaloo about why the recount numbers differed slightly from the original account. In really close races, where several recounts occur, the numbers <em>always</em> come back differently. And, no one knows which one is &#8220;right.&#8221; But, once there is a convergence as to the results, that is what gets accepted.</li>
</ul>
<p style="text-align: center;"><a href="http://www.flickr.com/photos/joebeone/2266247590/"><img class="aligncenter size-full wp-image-750" title="vote_recount_joebeone" src="http://www.gilliganondata.com/wp-content/uploads/2010/05/vote_recount_joebeone.jpg" alt="" width="500" height="375" /></a><em>Photo courtesy of </em><em><a title="joebeone" href="http://www.flickr.com/photos/joebeone/" target="_blank">joebeone</a></em></p>
<ul style="text-align: center;"></ul>
<p style="text-align: left;">That&#8217;s my list. Do you have examples that you use to explain why there&#8217;s more value in picking either number and interpreting it rather than obsessing about reconciling disparate numbers. I&#8217;m always looking for other analogies, though. Do you have any?</p>
<hr />
<p><small>&copy; Tim for <a href="http://www.gilliganondata.com">Gilligan on Data by Tim Wilson</a>, 2010. |
<a href="http://www.gilliganondata.com/index.php/2010/05/18/answering-the-why-doesnt-the-data-match-question/">Permalink</a> |
<a href="http://www.gilliganondata.com/index.php/2010/05/18/answering-the-why-doesnt-the-data-match-question/#comments">3 comments</a> |
Add to
<a href="http://del.icio.us/post?url=http://www.gilliganondata.com/index.php/2010/05/18/answering-the-why-doesnt-the-data-match-question/&amp;title=Answering the &#8220;Why doesn&#8217;t the data match?&#8221; Question">del.icio.us</a>
<br/>
Post tags: <a href="http://www.gilliganondata.com/index.php/tag/match/" rel="tag">match</a><br/>
</small></p>
<p><small>Feed enhanced by <a href='http://planetozh.com/blog/my-projects/wordpress-plugin-better-feed-rss/'>Better Feed</a> from  <a href='http://planetozh.com/blog/'>Ozh</a></small></p>
]]></content:encoded>
			<wfw:commentRss>http://www.gilliganondata.com/index.php/2010/05/18/answering-the-why-doesnt-the-data-match-question/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Reporting Tools Can&#8217;t Fix Bad Data</title>
		<link>http://www.gilliganondata.com/index.php/2010/03/10/reporting-tools-cant-fix-bad-data/</link>
		<comments>http://www.gilliganondata.com/index.php/2010/03/10/reporting-tools-cant-fix-bad-data/#comments</comments>
		<pubDate>Wed, 10 Mar 2010 14:30:27 +0000</pubDate>
		<dc:creator>tgwilson_php</dc:creator>
				<category><![CDATA[Data Management]]></category>
		<category><![CDATA[Stephen Few]]></category>

		<guid isPermaLink="false">http://www.gilliganondata.com/?p=713</guid>
		<description><![CDATA[Stephen Few wrote a brilliant (and rather scathing) post recently: Big BI is Stuck: Illustrated by SAP BusinessObjects Explorer. In the post, he extensively quotes marketingspeak from various SAP executives and then picks apart their claims. He follows that part of the post with excerpts from a review of their …]]></description>
			<content:encoded><![CDATA[<p>Stephen Few wrote a brilliant (and rather scathing) post recently: <a title="Big BI is Stuck: Illustrated by SAP BusinessObjects Explorer" href="http://www.perceptualedge.com/blog/?p=727" target="_blank">Big BI is Stuck: Illustrated by SAP BusinessObjects Explorer</a>. In the post, he extensively quotes marketingspeak from various SAP executives and then picks apart their claims. He follows that part of the post with excerpts from a review of their new BusinessObjects Explorer that highlights that, (alas!), the tool is not the killer app that makes access to all data easy and intuitive. Of course, SAP is by no means the first company to underdeliver on that promise!</p>
<p>The quote that really jumped out for me in the post, though, was this one:</p>
<blockquote><p>Anyone who understands BI, however, knows that no interface, no matter how magical, will give you access to data that isn’t available, will clean data that is dirty, or will simplify the navigation of complicated operational databases.</p></blockquote>
<p>That quote alone warranted a mini-blog post, if for no other reason than to allow me to quickly get my hands on it in the future when I&#8217;m in the midst of bashing my head against a cinder block wall of requests for crisp, clean analytic insights from a messy, messy world of data.</p>
<p>Mr. Few, I already had you placed high on a pedestal. Please accept this footstool upon which you can perch to be raised up just a little bit higher thanks to the clarity and insight within that one sentence you have penned!<strong>Similar Posts:</strong>
<ul class="similar-posts">
<li><a href="http://www.gilliganondata.com/index.php/2009/03/05/the-best-little-book-on-data/" rel="bookmark" title="March 5, 2009">The Best Little Book on Data</a></li>
</ul>
<p><!-- Similar Posts took 21.321 ms --></p>
<hr />
<p><small>&copy; tgwilson_php for <a href="http://www.gilliganondata.com">Gilligan on Data by Tim Wilson</a>, 2010. |
<a href="http://www.gilliganondata.com/index.php/2010/03/10/reporting-tools-cant-fix-bad-data/">Permalink</a> |
<a href="http://www.gilliganondata.com/index.php/2010/03/10/reporting-tools-cant-fix-bad-data/#comments">2 comments</a> |
Add to
<a href="http://del.icio.us/post?url=http://www.gilliganondata.com/index.php/2010/03/10/reporting-tools-cant-fix-bad-data/&amp;title=Reporting Tools Can&#8217;t Fix Bad Data">del.icio.us</a>
<br/>
Post tags: <a href="http://www.gilliganondata.com/index.php/tag/stephen-few/" rel="tag">Stephen Few</a><br/>
</small></p>
<p><small>Feed enhanced by <a href='http://planetozh.com/blog/my-projects/wordpress-plugin-better-feed-rss/'>Better Feed</a> from  <a href='http://planetozh.com/blog/'>Ozh</a></small></p>
]]></content:encoded>
			<wfw:commentRss>http://www.gilliganondata.com/index.php/2010/03/10/reporting-tools-cant-fix-bad-data/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The Spectrum of Data Sources for Marketers Is Wide (and Overwhelming)</title>
		<link>http://www.gilliganondata.com/index.php/2009/12/14/the-spectrum-of-data-sources-for-marketers-is-wide-and-overwhelming/</link>
		<comments>http://www.gilliganondata.com/index.php/2009/12/14/the-spectrum-of-data-sources-for-marketers-is-wide-and-overwhelming/#comments</comments>
		<pubDate>Mon, 14 Dec 2009 14:00:01 +0000</pubDate>
		<dc:creator>Tim Wilson</dc:creator>
				<category><![CDATA[Analysis]]></category>
		<category><![CDATA[Data Management]]></category>
		<category><![CDATA[Social Media]]></category>
		<category><![CDATA[Web Analytics]]></category>
		<category><![CDATA[data sources]]></category>
		<category><![CDATA[Malcolm Gladwell]]></category>

		<guid isPermaLink="false">http://www.gilliganondata.com/?p=631</guid>
		<description><![CDATA[I&#8217;ve been using an anecdote of late that Malcolm Gladwell supposedly related at a SAS user conference earlier this year: over the last 30 years, the challenge we face when it comes to using data to drive actions has fundamentally shifted from a challenge of &#8220;getting the right data&#8221; to …]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been using an anecdote of late that <a title="Malcolm Gladwell" href="http://en.wikipedia.org/wiki/Malcolm_Gladwell" target="_blank">Malcolm Gladwell</a> supposedly related at a SAS user conference earlier this year: over the last 30 years, the challenge we face when it comes to using data to drive actions has fundamentally shifted from a challenge of &#8220;getting the right data&#8221; to &#8220;looking at an overwhelming array of data in the right way.&#8221; To illustrate, he compared Watergate to Enron &#8212; in the former case, the challenge for Woodward and Bernstein was uncovering a relatively small bit of information that, once revealed, led to immediate insight and swift action. In the latter case, the data to show that Enron had built a house of cards was publicly available, but there was so much data that actually figuring out how to extract the underlying chicanery without knowing exactly where to look for it was next to impossible.</p>
<p>With that in mind, I started thinking about all of the sources of data that marketers now have available to them to drive their decisions. The challenge is that almost all of the data sources out there are <em>good</em> tools &#8212; while they all claim competitive advantage and differentiation from other options&#8230;I believe in the free markets to the extent that truly <em>bad</em> tools don&#8217;t survive (do a Google search for &#8220;SPSS Netgenesis&#8221; and the first link returned is a 404 page &#8212; the prosecution rests!). To avoid getting caught up in the shiny baubles of any given tool, it seems worth organizing the range of available data some way &#8212; put every source into a discrete bucket.  It turns out that that&#8217;s a pretty tricky thing to do, but one approach would be to put each data source available to us somewhere on a broad spectrum. At one end of the spectrum is data from secondary research &#8212; data that someone else has gone out and gathered about an industry, a set of consumers, a trend, or something else. At the other end of the spectrum is the data we collect on our customers in the course of conducting some sort of transaction with them &#8212; when someone buys a widget from our web site, we know their name, how they paid, what they bought, and when they bought it!</p>
<p>For poops and giggles, why not try to fill in that spectrum? Starting from the secondary research end, here we go&#8230;!</p>
<h3>Secondary Research (and Journalism&#8230;even Journalism 2.0)</h3>
<p>This category has an unlistable number of examples. From analyst firms like <a title="Forrester Research" href="http://www.forrester.com/rb/research" target="_blank">Forrester Research</a> and <a title="Gartner Group" href="http://www.gartner.com/technology/home.jsp" target="_blank">Gartner Group</a>, to trade associations like the <a title="American Marketing Association" href="http://www.marketingpower.com/Pages/default.aspx" target="_blank">AMA</a> or <a title="The ARF" href="http://www.thearf.org/" target="_blank">The ARF</a>, to straight-up journalists and trade publications, and even to bloggers. Specialty news aggregators like <a title="Alltop.com" href="http://alltop.com/" target="_blank">alltop.com</a> fall into this category as well (even if, technically, they would fit better into a &#8220;tertiary research&#8221; category, I&#8217;m going to just leave them here!).</p>
<p>I stumbled across <a title="iconoculture" href="http://iconoculture.com/" target="_blank">iconoculture</a> last week as one interesting company that falls in this category&#8230;although things immediately start to get a little messy, because they&#8217;ve got some level of primary research as well as some tracking/listening aspects of their offer.</p>
<h3>Listening/Collecting</h3>
<p>Moving along our spectrum of data sources, we get to an area that is positively exploding. These are tools that are almost always built on top of a robust database, because what they do is try to gather and organize what people &#8212; consumers &#8212; are doing/saying online. As a data source, these are still inherently &#8220;secondary&#8221; &#8212; they&#8217;re &#8220;what&#8217;s happening&#8221; and &#8220;what&#8217;s out there.&#8221; But, as our world becomes increasingly digital, this is a powerful source of information.</p>
<p>One group of tools here are sites like <a title="compete.com" href="http://compete.com" target="_blank">compete.com</a>, <a title="Alexa" href="http://alexa.com" target="_blank">Alexa</a>, and even Google&#8217;s various &#8220;insights&#8221; tools: <a title="Google Trends" href="http://www.google.com/trends" target="_blank">Google Trends</a>, <a title="Google Trends for Websites" href="http://trends.google.com/websites?q=wikipedia.org" target="_blank">Google Trends for Websites</a>, and <a title="Google Insights for Search" href="http://www.google.com/insights/search/#" target="_blank">Google Insights for Search</a>. These tools tend to not be so much consumer-focussed as site-focussed, but they&#8217;re getting their data by collecting what consumers are doing. And they are <em>darn</em> handy.</p>
<p>&#8220;Online listening platforms&#8221; are a newer beast, and there seems to be a new player in the space every day. The <a title="Forrester Wave - Listening Platforms - Q1 2009" href="http://www.nielsen-online.com/emc/0901_forrester/The%20Forrester%20Wave%20Listening%20Platforms%20Q1.pdf" target="_blank">Forrester Wave report by Suresh Vittal</a> in Q1 2009 seems like it is at least five years old. An incomplete list of companies/tools offering such platforms includes (in no particular order&#8230;except Nielsen is first because they&#8217;re the source of the registration-free PDF of the Forrester Wave report I just mentioned):</p>
<ul>
<li><a title="Nielsen Buzzmetrics" href="http://en-us.nielsen.com/tab/product_families/nielsen_buzzmetrics" target="_blank">Nielsen Buzzmetrics</a></li>
<li><a title="Buzzlogic" href="http://www.buzzlogic.com/" target="_blank">BuzzLogic</a></li>
<li><a title="Radian6" href="http://www.radian6.com/" target="_blank">Radian6</a></li>
<li><a title="SM2" href="http://alterian.com/products/social_media_monitoring-1.aspx" target="_blank">Alterian/Techrigy SM2</a></li>
<li><a title="Filtrbox" href="http://www.filtrbox.com/" target="_blank">Filtrbox</a></li>
<li><a title="Crimson Hexagon" href="http://www.crimsonhexagon.com/home/" target="_blank">Crimson Hexagon</a></li>
<li><a title="Collective Intellect" href="http://www.collectiveintellect.com/" target="_blank">Collective Intellect</a></li>
<li><a title="Spiderfly" href="http://www.webbedmarketing.com/socialmediamonitoring.html" target="_blank">Spiderfly</a></li>
</ul>
<p>And the list goes on and on and on&#8230; (see Marshall Sponder&#8217;s post: <a title="26 Tools for Social Media Monitoring" href="http://www.webmetricsguru.com/archives/2009/12/26-tools-for-social-media-monitoring/" target="_blank">26 Tools for Social Media Monitoring</a>). Each of these tools differentiates itself from their competition in some way, but none of them have truly emerged as a  sustained frontrunner.</p>
<h3 style="font-size: 1.17em;">Web Analytics</h3>
<p>I put web analytics next on the spectrum, but recognize that these tools have an internal spectrum all their own. From the &#8220;listening/collecting&#8221; side of the spectrum, web analytics tools simply &#8220;watch&#8221; activity on your web site &#8212; how many people went where and what they did when they got there. Moving towards the &#8220;1:1 transactions&#8221; end of the spectrum, web analytics tools collect data on specifically identifiable visitors to your site and provide that user-level specificity for analysis and action.</p>
<p><a title="Google Analytics" href="http://google.com/analytics" target="_blank">Google Analytics</a> pretty much resides at the &#8220;watching&#8221; end of this list, as does <a title="Yahoo! Web Anaytics / IndexTools" href="http://web.analytics.yahoo.com/" target="_blank">Yahoo! Web Analytics</a> (formerly IndexTools). But, then again, they&#8217;re free, and there&#8217;s a lot of power in effectively watching activity on your site, so that&#8217;s not a knock against them. The other major players &#8212; <a title="Omniture Sitecatalyst" href="http://www.omniture.com" target="_blank">Omniture Sitecatalyst</a>, <a title="Webtrends" href="http://www.webtrends.com" target="_blank">Webtrends</a>, <a title="Coremetrics" href="http://www.coremetrics.com" target="_blank">Coremetrics</a>, and the like &#8212; have more robust capabilities and can cover the full range of this mini-spectrum. They all are becoming increasingly open and more able to be integrated with other systems, be that with back-end CRM or marketing automation systems, or be that with the listening/collecting tools described in the prior section.</p>
<p>The list above covered &#8220;traditional web analytics,&#8221; but that field is expanding. A/B and multivariate testing tools fall into this category, as they &#8220;watch&#8221; with a very specific set of options for optimizing a specific aspect of the site. <a title="Optimost" href="http://www.optimost.com/" target="_blank">Optimost</a>, <a title="Omniture Test&amp;Target" href="http://www.omniture.com/en/products/conversion/testandtarget" target="_blank">Omniture Test&amp;Target</a>, and <a title="Google Website Optimizer" href="http://www.google.com/websiteoptimizer" target="_blank">Google Website Optimizer</a> all fall into this subcategory.</p>
<p>And, entire companies have popped up to fill specific niches with which traditional web analytics tools have struggled. My favorite example there is <a title="Clearsaleing" href="http://www.clearsaleing.com" target="_blank">Clearsaleing</a>, which uses technology very similar to all of the web analytics tools to <em>capture</em> data, but whose tools are built specifically to provide a meaningful view into campaign performance across multiple touchpoints and multiple channels. The niche their tool fills is improved &#8220;attribution management&#8221; &#8212; there&#8217;s even been a <a title="Interactive Attribution Forrester Wave" href="http://www.clearsaleing.com/attributionwave/" target="_blank">Forrester Wave devoted entirely to tools that try to do that</a> (registration required to download the report from Clearsaleing&#8217;s site).</p>
<h3 style="font-size: 1.17em;">Primary Research</h3>
<p>At this point on the spectrum, we&#8217;re talking about tools and techniques for collecting very specific data from consumers &#8212; going in with a set of questions that you are trying to get answered. Focus groups, phone surveys, and usability testing all fall in this area, as well as a plethora of online survey tools. Specifically, there are online survey tools designed to work with your web site &#8212; <a title="Foresee Results" href="http://foreseeresults.com" target="_blank">Foresee Results</a> and <a title="iPerceptions 4Q Survey" href="http://www.4qsurvey.com/" target="_blank">iPerceptions 4Q</a> are two that are solid for different reasons, but the list of tools in that space outnumbers even the list of online listening platforms.</p>
<p>The challenge with primary research is that you have to make the user aware that you are collecting information for the purpose of research and analysis. That drops a fly in the data ointment, because it is <em>very</em> easy to bias that data by not constructing the questions and the environment correctly. Even with a poorly designed survey, you will collect some powerful data &#8212; the problem is that the data may be misleading!</p>
<h3 style="font-size: 1.17em;">Transaction Data</h3>
<p>Beyond even primary research is the terminus of the spectrum &#8212; it&#8217;s customer data that you collect every day as a byproduct of running your business and interacting with customers. Whenever a customer interacts with your call center or makes a purchase on your web site, they are generating data as an artifact. When you send an e-mail to your database, you&#8217;ve generated data as to whom you sent the message&#8230;and many e-mail tools also track who opened and clicked through on the e-mail. This data can be very useful, but, to be useful, it needs to be captured, cleansed, and stored in a way that sets it up for useful analysis. There&#8217;s an entire industry built around customer data management, and most of what the tools and processes in that industry focus on is transaction data.</p>
<h3 style="font-size: 1.17em;">What&#8217;s Missing?</h3>
<p>As much as I would like to wrap up this post by congratulating myself on providing an all-encompassing framework&#8230;I can&#8217;t. While there are a lot of specific tools/niches that I haven&#8217;t listed here that I could fit somewhere on the spectrum of tools as I&#8217;ve described it, there are also sources of valuable data that don&#8217;t fit in this framework. One type that jumps out to me is marketing mix-type data and tools (think <a title="Analytic Partners" href="http://www.analyticpartners.com/" target="_blank">Analytic Partners</a>, <a title="ThinkVine" href="http://www.thinkvine.com" target="_blank">ThinkVine</a>, or <a title="MarketShare Partners" href="http://marketsharepartners.com/" target="_blank">MarketShare Partners</a>). I&#8217;m sure there are <em>many</em> other types. Nevertheless, it seems like a worthwhile framework to have when it comes to building up a portfolio of data sources. Are you getting data from across the entire spectrum (there are free or near-free tools at every point on the spectrum)? Are you getting redundant data?</p>
<p>What do you think? Is it possible to organize &#8220;all data sources for marketers&#8221; in a meaningful way? Is there value in doing so?</p>
<hr />
<p><small>&copy; Tim for <a href="http://www.gilliganondata.com">Gilligan on Data by Tim Wilson</a>, 2009. |
<a href="http://www.gilliganondata.com/index.php/2009/12/14/the-spectrum-of-data-sources-for-marketers-is-wide-and-overwhelming/">Permalink</a> |
<a href="http://www.gilliganondata.com/index.php/2009/12/14/the-spectrum-of-data-sources-for-marketers-is-wide-and-overwhelming/#comments">One comment</a> |
Add to
<a href="http://del.icio.us/post?url=http://www.gilliganondata.com/index.php/2009/12/14/the-spectrum-of-data-sources-for-marketers-is-wide-and-overwhelming/&amp;title=The Spectrum of Data Sources for Marketers Is Wide (and Overwhelming)">del.icio.us</a>
<br/>
Post tags: <a href="http://www.gilliganondata.com/index.php/tag/data-sources/" rel="tag">data sources</a>, <a href="http://www.gilliganondata.com/index.php/tag/malcolm-gladwell/" rel="tag">Malcolm Gladwell</a><br/>
</small></p>
<p><small>Feed enhanced by <a href='http://planetozh.com/blog/my-projects/wordpress-plugin-better-feed-rss/'>Better Feed</a> from  <a href='http://planetozh.com/blog/'>Ozh</a></small></p>
]]></content:encoded>
			<wfw:commentRss>http://www.gilliganondata.com/index.php/2009/12/14/the-spectrum-of-data-sources-for-marketers-is-wide-and-overwhelming/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Inertia of the Status Quo</title>
		<link>http://www.gilliganondata.com/index.php/2009/09/04/the-inertia-of-the-status-quo/</link>
		<comments>http://www.gilliganondata.com/index.php/2009/09/04/the-inertia-of-the-status-quo/#comments</comments>
		<pubDate>Fri, 04 Sep 2009 14:04:11 +0000</pubDate>
		<dc:creator>Tim Wilson</dc:creator>
				<category><![CDATA[Data Management]]></category>
		<category><![CDATA[cognitive dissonance]]></category>
		<category><![CDATA[intertia]]></category>
		<category><![CDATA[James Surowiecki]]></category>
		<category><![CDATA[status quo]]></category>

		<guid isPermaLink="false">http://www.gilliganondata.com/?p=514</guid>
		<description><![CDATA[Some definitions (courtesy of Wiktionary): status quo &#8212; the way things are, as opposed to the way they could be inertia &#8212; The property of a body that resists any change to its uniform motion cognitive dissonance &#8212; a conflict or anxiety resulting from inconsistencies between one&#8217;s beliefs and one&#8217;s actions or other …]]></description>
			<content:encoded><![CDATA[<p>Some definitions (courtesy of <a title="Wiktionary" href="http://en.wiktionary.org/wiki/Wiktionary:Main_Page" target="_blank">Wiktionary</a>):</p>
<ul>
<li><a title="status quo" href="http://en.wiktionary.org/wiki/status_quo" target="_blank">status quo</a> &#8212; the way things are, as opposed to the way they could be</li>
<li><a title="inertia" href="http://en.wiktionary.org/wiki/inertia" target="_blank">inertia</a> &#8212; The property of a body that resists any change to its uniform motion</li>
<li><a title="cognitive dissonance" href="http://en.wiktionary.org/wiki/cognitive_dissonance" target="_blank">cognitive dissonanc</a>e &#8212; a conflict or anxiety resulting from inconsistencies between one&#8217;s beliefs and one&#8217;s actions or other beliefs</li>
</ul>
<p><img style="border: 0pt none; float:left;  padding-right:20px; padding-bottom:10px" title="OlofS_boulders" src="http://www.gilliganondata.com/wp-content/uploads/2009/09/OlofS_boulders.jpg" alt="OlofS_boulders" width="240" height="180" />The first two of these can be applied to any sort of technology or process change being introduced to an organization &#8212; entire careers and companies are built around trying to figure out how to effectively drive change within organizations. In the case of data management, the third defintion &#8212; cognitive dissonance &#8212; comes into play as well.</p>
<p>As a brilliant and phenomenally handsome man* once said, &#8220;Customers are people, and people are messy.&#8221; Customer data is inherently incomplete and imperfect. Any process or system that captures and stores customer data stores flawed data as soon as it rolls out for two reasons:</p>
<ul>
<li>It is not reasonable to add to any process all of the overhead required to rigorously capture and validate all attributes of a customer &#8212; it&#8217;s a balancing act between the efficiency of the process and the quality of the data captured</li>
<li>Customer data decays, and it decays a lot more quickly than we like to admit; customer data <em>maintenance</em> tends to be an afterthought that gets addressed only after time has degraded the data to the point that it starts causing the company real problems</li>
</ul>
<p>Once we hit the point where we really need to tackle our customer data management head on, we have two options, of which one option is completely inviable:</p>
<ol>
<li>Throw out all of our customer data, customer data processes, and customer data systems and start over, but &#8220;do it right this time&#8221;</li>
<li>Identify the most broken parts of our processes and start fixing them &#8212; going after the lowest cost and highest benefit ones first and then working our way down the list until we hit a satisfactory point (which is, typically, never)</li>
</ol>
<p>Clearly, the first option is not an option. No company would survive if they tossed out their customer base, barred their doors, and conducted no business for a year or two while they rebuilt their process and technology infrastructure.</p>
<p>That leaves us with the second option (technically, &#8220;do nothing&#8221; is an option as well, but that&#8217;s only an option if the pain hasn&#8217;t reach the point where it&#8217;s <em>not</em> an option!), and, thus, we reach a cognitive dissonance conundrum:</p>
<blockquote><p>We know our customer data is dirty &#8212; customer service reps complain about the number of duplicate records in their systems, sales reps complain of the incomplete pictures they have of their customers (which hinders their ability to prep for and conduct customer visits), marketing complains that they can&#8217;t effectively segment and target their database because the customer data is bad, <em>customers</em> complain because the company keeps screwing things up in one way or another&#8230;</p>
<p style="text-align: center;"><strong>BUT</strong></p>
<p>&#8230;as we start to explore and design replacement processes, we realize that these processes are going to be inherently imperfect, too. We may accept that the new process will be <em>better</em> (even significantly so), but we obsess about the flaws.</p></blockquote>
<p>We don&#8217;t want to repeat the mistakes of the past and roll out something that is not bulletproof &#8212; a chink in the data management armor is a chink, no matter how small. So we obsess about the chinks. We propose process changes to accomodate the identified gaps. <em>Even for the gaps that are purely theoretical</em> (&#8220;yes, I see, but <em>what if</em> the poles reversed at the exact same point that pigs learned to fly &#8212; the process would break!&#8221;) We&#8217;re trying to do the right thing. We&#8217;re aiming for perfection &#8212; for a flawless process.</p>
<p>But we&#8217;re talking about customer data, and customers are people, and people are messy.</p>
<p>We find ourselves (and/or the people who will ultimately need to adopt the process) paralyzed, caught in an endless cycle of Visio vetting and process rework, perpetually getting halfway to the &#8220;perfect&#8221; process, but never actually getting there. At some point, due to impatience or frustration, someone stands up and yells, &#8220;Enough! Just build what you&#8217;ve got!&#8221;</p>
<p>And then we realize we&#8217;ve designed a process that is so complex and unwieldy that the cost to implement it would wipe out any hope of the company having a profitable year for the ensuing decade.</p>
<p>Of <em>course</em> you&#8217;d like a more tangible example:</p>
<p>Let&#8217;s say we&#8217;re trying to clean up our customer&#8217;s mailing addresses (which, thankfully, is now an exercise from my <em>past</em>, but that&#8217;s more a digression for a discussion over drinks than for a blog post!). Let&#8217;s say that, for any 1,000 customer addresses, we have conclusively demonstrated that at least 50 of them are bad &#8212; the postal service is going to struggle to deliver mail sent to them, and the postal service is going to fail more often than not. Now, let&#8217;s also say that we&#8217;ve demonstrated that, by introducing some automated cleansing processes, we can: 1) identify those 50 addresses, 2) &#8220;fix&#8221; 30 of them, and 3) flag the remaining 20 as being known problems that need some sort of manual intervention. Let&#8217;s say that, rather than 1,000 records, we&#8217;re talking about 10 million.</p>
<p>&#8220;Hurray!&#8221;</p>
<p>&#8220;Sounds great!&#8221;</p>
<p>&#8220;<em>Awesome</em>!&#8221;</p>
<p>&#8220;Gimme some of <em>that!</em>&#8221;</p>
<p>Ah&#8230;<strong>BUT&#8230;</strong></p>
<p>&#8230;we have also  determined that, as part of those automated cleansing processes, we <em>might</em> actually take <em>1</em> of the 950 addresses that were already good&#8230;and make it worse.</p>
<p>Logically, the project should still be a go. We&#8217;re making 30 addresses better and only <em>might</em> be making a single address worse!</p>
<p><img style="border: 0pt none; float:right;  padding-left:10px; padding-bottom:10px" title="bobster855_unhappyman" src="http://www.gilliganondata.com/wp-content/uploads/2009/09/bobster855_unhappyman.jpg" alt="bobster855_unhappyman" width="240" height="146" /></p>
<p>Ohhhhh&#8230;that single address. That molehill that eats its Wheaties, regularly applies cream provided by a shady character, and injects itself in the buttocks with a substance its cousin purchased over the counter in the Dominican Republic. The molehill grows. It grows quickly. Suspiciously quickly&#8230;yet no one seems to notice. It becomes a hill, and then a big hill, and then a mountain! The project manager is left scratching his head and wondering how a theoretical aside in a meeting three weeks ago has now become a virtually insurmountable issue that has put the entire project at risk of ever being implemented!</p>
<p>Cognitive dissonance &#8212; simultaneously recognizing that things are bad and must be fixed, but also accepting that the status quo is &#8220;right.&#8221;</p>
<p>The answer? I&#8217;d like to say it&#8217;s just a matter of putting the dissonant perspectives side by side and forcing objectors to reconcile them. That should work, right?</p>
<p>Alas!</p>
<p>As it happens, the current debate about healthcare reform in the U.S. prompted James Surowiecki to right a column on <a title="Status-Quo Anxiety" href="http://www.newyorker.com/talk/financial/2009/08/31/090831ta_talk_surowiecki" target="_blank">Status-Quo Anxiety</a> in <em>The New Yorker </em>a couple of weeks ago<em>. </em>Surowiecki discusses the &#8220;endowment effect:&#8221;</p>
<blockquote><p>&#8220;&#8230;the mere fact that you own something leads you to overvalue it. A simple demonstration of this was <a title="Anomalies: The Endowment Effect, Loss Aversion, and Status Quo Bias" href="http://faculty.chicagobooth.edu/richard.thaler/research/articles/1-The_Endowment_Effect_Loss_Aversion_and_Status_Quo_Bias.pdf" target="_blank">an experiment</a> in which some students in a class were given coffee mugs emblazoned with their school’s logo and asked how much they would demand to sell them, while others in the class were asked how much they would pay to buy them. Instead of valuing the mugs similarly, the new owners of the mugs demanded more than twice as much as the buyers were willing to pay.&#8221;</p></blockquote>
<p>Surowiecki goes on to relate this effect to the healthcare debate: &#8220;What this suggests about health care is that, if people have insurance, most will value it highly, no matter how flawed the current system.&#8221;</p>
<p>The same applies to customer data management all too often &#8212; we know we have a flawed system, but it&#8217;s the system we have, gosh darn it, and I don&#8217;t want your new system if I can find <em>any </em>imperfections in it!</p>
<p>This really has been a farewell post of sorts. Rambling, yes. Academic, yes. Lacking any prescriptive solution. But, hopefully at least a little entertaining, and maybe even with an insight or two that may come in handy to you. Look for a topical shift to measuring digital media going forward.</p>
<p>So long, and thanks for the fish!</p>
<p>* Dramatic license &#8212; <em>I</em> said that in <a href="http://www.gilliganondata.com/index.php/2009/08/10/rare-x-rare-x-rare-in-customer-data-management/" target="_blank">this post</a>, and &#8220;brilliant and phenomenally handsome&#8221; is perhaps a bit of an overstatement.</p>
<p style="text-align: right; "><em>Photos courtesy of </em><a title="Olof S on Flickr" href="http://www.flickr.com/photos/venteco/" target="_blank"><em>Olof S</em></a><em> and </em><a title="bobster855 on Flickr" href="http://www.flickr.com/photos/32912172@N00/" target="_blank"><em>bobster855</em></a></p>
<hr />
<p><small>&copy; Tim for <a href="http://www.gilliganondata.com">Gilligan on Data by Tim Wilson</a>, 2009. |
<a href="http://www.gilliganondata.com/index.php/2009/09/04/the-inertia-of-the-status-quo/">Permalink</a> |
<a href="http://www.gilliganondata.com/index.php/2009/09/04/the-inertia-of-the-status-quo/#comments">No comment</a> |
Add to
<a href="http://del.icio.us/post?url=http://www.gilliganondata.com/index.php/2009/09/04/the-inertia-of-the-status-quo/&amp;title=The Inertia of the Status Quo">del.icio.us</a>
<br/>
Post tags: <a href="http://www.gilliganondata.com/index.php/tag/cognitive-dissonance/" rel="tag">cognitive dissonance</a>, <a href="http://www.gilliganondata.com/index.php/tag/intertia/" rel="tag">intertia</a>, <a href="http://www.gilliganondata.com/index.php/tag/james-surowiecki/" rel="tag">James Surowiecki</a>, <a href="http://www.gilliganondata.com/index.php/tag/status-quo/" rel="tag">status quo</a><br/>
</small></p>
<p><small>Feed enhanced by <a href='http://planetozh.com/blog/my-projects/wordpress-plugin-better-feed-rss/'>Better Feed</a> from  <a href='http://planetozh.com/blog/'>Ozh</a></small></p>
]]></content:encoded>
			<wfw:commentRss>http://www.gilliganondata.com/index.php/2009/09/04/the-inertia-of-the-status-quo/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Type I vs. Type II Errors in Customer Data Management</title>
		<link>http://www.gilliganondata.com/index.php/2009/08/18/type-i-vs-type-ii-errors-in-customer-data-management/</link>
		<comments>http://www.gilliganondata.com/index.php/2009/08/18/type-i-vs-type-ii-errors-in-customer-data-management/#comments</comments>
		<pubDate>Tue, 18 Aug 2009 14:00:06 +0000</pubDate>
		<dc:creator>Tim Wilson</dc:creator>
				<category><![CDATA[Data Management]]></category>
		<category><![CDATA[CDI]]></category>
		<category><![CDATA[customer data integration]]></category>
		<category><![CDATA[false negative]]></category>
		<category><![CDATA[false positive]]></category>
		<category><![CDATA[Type I error]]></category>
		<category><![CDATA[Type II error]]></category>

		<guid isPermaLink="false">http://www.gilliganondata.com/?p=496</guid>
		<description><![CDATA[Last week, I ended a post promising a future post on Type I vs. Type II errors when it comes to customer data management. I&#8217;ve found myself running into confusion on the distinction, with all customer data errors being treated as the same type, when they are not. Let&#8217;s Start …]]></description>
			<content:encoded><![CDATA[<p>Last week, I ended <a title="Rare x Rare x Rare in Customer Data Management" href="http://www.gilliganondata.com/index.php/2009/08/10/rare-x-rare-x-rare-in-customer-data-management/" target="_blank">a post</a> promising a future post on Type I vs. Type II errors when it comes to customer data management. I&#8217;ve found myself running into confusion on the distinction, with <em>all</em> customer data errors being treated as the same type, when they are not.</p>
<h2>Let&#8217;s Start with Definitions</h2>
<p>Wikipedia has a <a title="Type I and Type II Errors" href="http://en.wikipedia.org/wiki/Type_I_and_type_II_errors" target="_blank">nice write-up</a> that goes into much deeper detail, but here are my quick and crude definitions:</p>
<ul>
<li>Type I Error = α (Alpha) Error = False Positive = you mistakenly judge something to be true that, in fact, is false</li>
<li>Type II Error = β (Beta) Error = False Negative = you mistakenly judge something to be false that, in fact, is true</li>
</ul>
<p><img class="aligncenter size-full wp-image-500" title="Type I vs. Type II Errors" src="http://www.gilliganondata.com/wp-content/uploads/2009/08/TypeI_TypeII1.JPG" alt="Type I vs. Type II Errors" width="539" height="286" /></p>
<p>It&#8217;s easy to get mired in statistics-speak and make a snap judgment that this distinction between types of data errors are solely of interest and use to statisticians. That&#8217;s not the case. Depending on the situation, one type can be critical while the other can be barely consequential.</p>
<h2>Customer Data Example: Automated Customer Merges</h2>
<p>Most companies of any size battle duplicates in their customer data. I have yet to work at or with a company where the sales force and call center staff don&#8217;t complain about the number of duplicates that exist in internal systems. In this situation, it&#8217;s pretty common to try to automate a deduplication routine of some sort (some would say that this is the biggest benefit of a <a title="Customer Data Integration" href="http://en.wikipedia.org/wiki/Customer_Data_Integration" target="_blank">customer data integration</a>, or CDI, initiative).</p>
<p>Unfortunately, customer data is messy. For every &#8220;identical match&#8221; in the system, there are generally 5 to 10 &#8220;probable matches.&#8221; After all, if the data was truly identical, then the likelihood that a duplicate would have gotten created in the first place would have been greatly diminished! That means that the automated matching system has to apply various business rules to determine which matches are true matches and which ones are not &#8212; the system makes an educated guess. Many CDI systems apply a scoring system &#8212; the higher the score, the more likely the two records are a true match. The lower the score, the less likely. It&#8217;s then up to the user to establish the threshold above which the records will automatically be merged (and, possibly, a lower threshold that will trigger a manual review of the records by a human being).</p>
<p>Where that threshold gets set is <em>purely</em> a Type I vs. Type II error decision:</p>
<ul>
<li>The higher the threshold gets set, the greater the likelihood of a Type II error &#8212; a false negative where two records could have been merged because they were, in fact, duplicates, but they were not identified as such</li>
<li>The lower the threshold gets set, the greater the likelihood of a Type I error &#8212; a false positive where two records get merged, even though, in reality, they represent two different customers</li>
</ul>
<p>It would be <em>wonderful</em> if the matching logic was such that a threshold existed that eliminated all errors, but no such threshold will ever exist. The question then becomes: which type of error is &#8220;more&#8221; acceptable? That will influence where you set your threshold. It&#8217;s a <em>situational question</em>! For example:</p>
<ul>
<li>If your customer data includes extremely sensitive information (think medical records, social security numbers, credit card numbers), then you will want to err on the side of Type II / False Negative errors &#8212; deal with the messiness of more duplicates and put the burden on your call center / sales force to trigger merges if and only if they have confirmed the merge needs to happen through a manual inspection and/or an interaction directly with the customer</li>
<li>If, on the other hand, you are only using the post-merged customer data to send marketing promotions to the customers, then the added cost of the direct mail may push you to err on the side of Type I / False Positive errors &#8212; some of your customers may miss out on some marketing promotions, but very few of them will receive multiple copies of the same direct mail piece, and your overall postage costs will be lower</li>
</ul>
<p>This is a very real example &#8212; a single company may use the former mentality as its core matching logic, but then maintain a separate data store with more aggressive matching for purposes where Type I errors are not critical.</p>
<h2>Type I and Type II in the News</h2>
<p>A friend and former colleague shared a recent story in <em>The Washington Post</em> about the Social Security Administration running into trouble when <a title="Social Security to Pay $500 Million To 80,000 Victims of Database Error" href="http://www.washingtonpost.com/wp-dyn/content/article/2009/08/11/AR2009081103282.html" target="_blank">trying to deny social security benefits to certain felons</a>. It sounds like there were some technical gaffs, for sure &#8212; when people with minor offenses on their record got caught up in the denial of benefits. Overall, this reads like it was a case of an unacceptable number of Type I errors &#8212; people being denied benefits because they were mis-identified as &#8220;fleeing felons&#8221; (the article reads like some of the stories of people being banned from flying because they had the same name as someone on the terrorist watch list).</p>
<p>Both types of errors <em>will</em> occur. As important as being clear on which type are &#8220;better&#8221; given the situation (and developing your processes accordingly), ensuring that you have processes in place to fix the errors when they do occur and are identified is just as critical &#8212; something that the federal government was apparently missing in this case! The federal government? Lacking in customer service? Shocker!</p>
<p>So, there you have it &#8212; more thought than you ever wanted to see on Type I and Type II errors. I also promised in my <a title="Rare x Rare x Rare in Customer Data Management" href="http://www.gilliganondata.com/index.php/2009/08/10/rare-x-rare-x-rare-in-customer-data-management/" target="_blank">earlier post</a> that a future post would cover my observations on cognitive dissonance and the status quo when it comes to customer data management, but that&#8217;s going to have to wait for another day!<strong>Similar Posts:</strong>
<ul class="similar-posts">
<li><a href="http://www.gilliganondata.com/index.php/2008/08/22/your-customer-data-is-dirtier-than-you-think/" rel="bookmark" title="August 22, 2008">Your Customer Data Is Dirtier than You Think</a></li>
</ul>
<p><!-- Similar Posts took 19.131 ms --></p>
<hr />
<p><small>&copy; Tim for <a href="http://www.gilliganondata.com">Gilligan on Data by Tim Wilson</a>, 2009. |
<a href="http://www.gilliganondata.com/index.php/2009/08/18/type-i-vs-type-ii-errors-in-customer-data-management/">Permalink</a> |
<a href="http://www.gilliganondata.com/index.php/2009/08/18/type-i-vs-type-ii-errors-in-customer-data-management/#comments">No comment</a> |
Add to
<a href="http://del.icio.us/post?url=http://www.gilliganondata.com/index.php/2009/08/18/type-i-vs-type-ii-errors-in-customer-data-management/&amp;title=Type I vs. Type II Errors in Customer Data Management">del.icio.us</a>
<br/>
Post tags: <a href="http://www.gilliganondata.com/index.php/tag/cdi/" rel="tag">CDI</a>, <a href="http://www.gilliganondata.com/index.php/tag/customer-data-integration/" rel="tag">customer data integration</a>, <a href="http://www.gilliganondata.com/index.php/tag/false-negative/" rel="tag">false negative</a>, <a href="http://www.gilliganondata.com/index.php/tag/false-positive/" rel="tag">false positive</a>, <a href="http://www.gilliganondata.com/index.php/tag/type-i-error/" rel="tag">Type I error</a>, <a href="http://www.gilliganondata.com/index.php/tag/type-ii-error/" rel="tag">Type II error</a><br/>
</small></p>
<p><small>Feed enhanced by <a href='http://planetozh.com/blog/my-projects/wordpress-plugin-better-feed-rss/'>Better Feed</a> from  <a href='http://planetozh.com/blog/'>Ozh</a></small></p>
]]></content:encoded>
			<wfw:commentRss>http://www.gilliganondata.com/index.php/2009/08/18/type-i-vs-type-ii-errors-in-customer-data-management/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Rare x Rare x Rare in Customer Data Management</title>
		<link>http://www.gilliganondata.com/index.php/2009/08/10/rare-x-rare-x-rare-in-customer-data-management/</link>
		<comments>http://www.gilliganondata.com/index.php/2009/08/10/rare-x-rare-x-rare-in-customer-data-management/#comments</comments>
		<pubDate>Mon, 10 Aug 2009 14:00:37 +0000</pubDate>
		<dc:creator>Tim Wilson</dc:creator>
				<category><![CDATA[Data Management]]></category>
		<category><![CDATA[equation]]></category>

		<guid isPermaLink="false">http://www.gilliganondata.com/?p=482</guid>
		<description><![CDATA[I once had an operations management professor who asked the class how often we would expect a product to be defective if it was made of 10 components, each of which had a 1% defect rate, if a single component failure would result in the entire product not working. The …]]></description>
			<content:encoded><![CDATA[<p>I once had an operations management professor who asked the class how often we would expect a product to be defective if it was made of 10 components, each of which had a 1% defect rate, if a single component failure would result in the entire product not working.</p>
<p>The math is pretty simple:</p>
<p style="text-align: center;"><strong>99% x 99% x 99% x 99% x 99% x 99% x 99% x 99% x 99% x 99% = 90.4%</strong></p>
<p>Only 90.4% of the finished products would work? That doesn&#8217;t seem good at all! Considering that there are very few manufactured products &#8212; especially electronic ones &#8212; that have only ten critical parts, it was an eye-opening insight (albeit obvious in hindsight). </p>
<p><a href="http://www.flickr.com/photos/billburris/"><img style="border: 0pt none; float:right;  padding-left:10px; padding-bottom:10px" title="Equations" src="http://www.gilliganondata.com/wp-content/uploads/2009/08/2245430380_dbd93c275f_m.jpg" alt="Equations" width="240" height="180" /></a>The point the professor was making was that there are many cases where &#8220;99% perfect&#8221; really isn&#8217;t good enough when that one part is considered in a larger context.</p>
<p>Of late, I&#8217;ve had a few run-ins with the opposite insight. Stick with me &#8212; it&#8217;ll be fun!</p>
<p>For starters, customer data management processes are <em>not</em> automated manufacturing processes. Customers are people, and people are messy!</p>
<p>In a manufacturing environment, a key way to drive quality is to remove as much variability as possible by strictly controlling the environment. Customers (people) are none too keen about being &#8220;strictly controlled.&#8221; From a pure (read: manufacturing) customer data perspective, what we&#8217;d like is:</p>
<ul>
<li>To have every human assigned a unique ID</li>
<li>To have every human log into a system once a week and update all sorts of meta data about themselves:
<ul>
<li>Who they are related to and how (using those people&#8217;s unique IDs)</li>
<li>What products they own</li>
<li>How old they are</li>
<li>How much they weigh</li>
<li>What their favorite flavor or ice cream is</li>
<li>What political party they support</li>
<li>&#8230;and so on</li>
</ul>
</li>
<li>To enter all of this information from drop-down lists so that all of the data is structured</li>
<li>To have them be <em>very, very</em> careful when they update this information, maybe even swearing that the data is perfectly accurate, under penalty of severe consequences</li>
</ul>
<p>Obviously, that ain&#8217;t gonna happen.</p>
<p>Our processes to manage customer data are very different from manufacturing processes for one simple reason: the data does not <em>have</em> to be perfect. It has to be <em>good enough</em> for us to effectively interact with our customers.</p>
<p>Here is where a similar example to the one that started this post comes into play. When working on processes that deal with customer data &#8212; creating or maintaining it &#8212; we all develop use cases and scenarios to ensure that we are keeping the data as accurate as possible. It is exceedingly easy (and awfully tempting) to start working with scenarios that are <em>theoretically ossible</em> but not very <em>probable</em>. If we&#8217;re not wearing our Hat of Practicality, we will find ourselves developing processes that are so inordinately complex that one of two things happen:</p>
<ul>
<li>We never get the new process implemented because it collapses under its own developmental weight, or</li>
<li>We implement it, but it is so complex that it collides with itself and starts generating bad customer data!</li>
</ul>
<p>Is this sounding theoretical? I&#8217;ll illustrate with an example.</p>
<p>A couple of weeks ago, I ran into an issue that had to do with a new third-party data cleansing process that we are introducing that involved sending customer name and address data to a third party service (all over obscenely secure channels and with no more personal information than could easily be found in a phone book or through <a title="Yahoo! People" href="http://people.yahoo.com" target="_blank">Yahoo! People</a>). During testing, we came across some unexpected behavior as to how the third party vendor handled hyphenated last names. The initial proposal was to throw out responses for any customer who had a hyphenated last name. Something seemed amiss with that approach.</p>
<p>I thought up the most plausible scenario I could where the returned data would actually be incorrect, and it looked like this (I&#8217;ll spare you the details as to why this scenario was the most plausible &#8212; just trust me):</p>
<ul>
<li>John Smith marries Mary Jones and they both keep their original surnames</li>
<li>They have a son named John Smith-Jones</li>
<li>When John Smith-Jones is a teenager, he becomes a customer of the same company of which his dad is a customer</li>
<li>When John Smith-Jones graduates from high school, he moves out of the house, while both he and his father remain customers of that same company</li>
</ul>
<p>In <em>this</em> scenario&#8230;the process would be a little broken &#8212; in a way that the customer (the father, in this case) would probably understand and would definitely be able to easily correct.</p>
<p>So, here comes the math. Without doing any research beyond my own gut-based estimates from 37 years of experience on planet Earth, I made conservative estimates for all of the variables involved:</p>
<ul>
<li>The percent of all married couples in the U.S. where both parties have kept their original surnames: 1%</li>
<li>The percent of all kids in the U.S. with hyphenated surnames: 0.5%</li>
<li>The percent of all kids in the U.S. who share the same name as their mother or father: 2%</li>
<li>How often a kid in the U.S. is a separate, <em>distinct </em>customer of the same company that his parents are (in the particular space this company is in) at the point that he/she leaves home: 75%</li>
</ul>
<p>Then comes the math. It works just the opposite from the original equation, in that it is an &#8220;AND&#8221; situation rather than an &#8220;OR&#8221; situation &#8212; <em>all</em> of these factors had to be met in order for the process to make an erroneous customer data update (as opposed to <em>any one</em> of the components having to be defective in order for the final product to be defective):</p>
<p style="text-align: center; "><strong>1% x 0.5% x 2% x 75% = 0.000075% (!!!)</strong></p>
<p>If my estimates were accurate, which they almost assuredly were not, then we would make this customer data error roughly <strong>once for every </strong><strong>one million customers</strong>. If you think about it, you realize that the absolute accuracy of the small percentages just doesn&#8217;t really matter once those small percentages start multiplying. Let&#8217;s say I was off by a factor of <strong>four</strong> on my estimate of the percent of kids with hyphenated last names, so the formula above should have 2% where it had 0.5%. That ups the likelihood of this data error occurring to 3 times in a million rather than less than one &#8212; given the highly non-catastrophic nature of the error, this is still an &#8220;almost never&#8221; when it comes to looking at the types of other, more critical customer data errors that occur day in and day out.</p>
<p>In this case, there was another factor that I could have applied, and that was, for those one million customers, how many would be affected in any given year? 13% is a fair estimate of how many people move each year in the U.S., which means we would need to apply <em>that</em> percentage to the original result&#8230;and we&#8217;re back to &#8220;effectively never&#8221; for our likelihood of occurence.</p>
<p>There are a couple of caveats here, and they&#8217;re important:</p>
<ul>
<li>I came up with one scenario. If there were four other plausible scenarios that were all equally likely to occur, then I would need to multiply the final result by five. In this case, we&#8217;re still talking a very small number, but there may be cases where a particular process gap could cause problems in a long tail&#8217;s worth of scenarios and may need to be viewed differently</li>
<li>It&#8217;s worth vetting the estimates somewhat &#8212; not through extensive research, necessarily, but at least by running them by a couple of sharp people to see if they pass the sniff test</li>
</ul>
<p>In this example, we were deep into testing &#8212; well past the point where code updates could be made without introducing risk to the overall implementation. To me, it was a no-brainer &#8212; proceed as planned!</p>
<p>The pushback I&#8217;ve received in other, similar situations, has been: &#8220;Well, yeah, that&#8217;s only one person in a million. But&#8230;what about that one person?!&#8221; THAT gets us to my next post, which will be about Type 1 vs. Type 2 errors and cognitive dissonance when it comes to both knowing that the status quo is bad but also assuming the status quo is right. More on that next time!</p>
<p style="text-align: right;"><em>Photo by </em><a title="Bill Burris Photostream" href="http://www.flickr.com/photos/billburris/" target="_blank"><em>Bill Burris</em></a></p>
<hr />
<p><small>&copy; Tim for <a href="http://www.gilliganondata.com">Gilligan on Data by Tim Wilson</a>, 2009. |
<a href="http://www.gilliganondata.com/index.php/2009/08/10/rare-x-rare-x-rare-in-customer-data-management/">Permalink</a> |
<a href="http://www.gilliganondata.com/index.php/2009/08/10/rare-x-rare-x-rare-in-customer-data-management/#comments">5 comments</a> |
Add to
<a href="http://del.icio.us/post?url=http://www.gilliganondata.com/index.php/2009/08/10/rare-x-rare-x-rare-in-customer-data-management/&amp;title=Rare x Rare x Rare in Customer Data Management">del.icio.us</a>
<br/>
Post tags: <a href="http://www.gilliganondata.com/index.php/tag/equation/" rel="tag">equation</a><br/>
</small></p>
<p><small>Feed enhanced by <a href='http://planetozh.com/blog/my-projects/wordpress-plugin-better-feed-rss/'>Better Feed</a> from  <a href='http://planetozh.com/blog/'>Ozh</a></small></p>
]]></content:encoded>
			<wfw:commentRss>http://www.gilliganondata.com/index.php/2009/08/10/rare-x-rare-x-rare-in-customer-data-management/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Data Management &#8212; As Sexy As a High Quality Mattress</title>
		<link>http://www.gilliganondata.com/index.php/2009/07/01/data-management-as-sexy-as-a-high-quality-mattress/</link>
		<comments>http://www.gilliganondata.com/index.php/2009/07/01/data-management-as-sexy-as-a-high-quality-mattress/#comments</comments>
		<pubDate>Wed, 01 Jul 2009 16:00:38 +0000</pubDate>
		<dc:creator>Tim Wilson</dc:creator>
				<category><![CDATA[Data Management]]></category>
		<category><![CDATA[Eloqua]]></category>
		<category><![CDATA[marketing automation]]></category>
		<category><![CDATA[mattress]]></category>
		<category><![CDATA[Steve Woods]]></category>

		<guid isPermaLink="false">http://www.gilliganondata.com/?p=408</guid>
		<description><![CDATA[Steve Woods of Eloqua invited me to write a guest post on his Digital Body Language blog after we&#8217;d gone back and forth a bit about contact data management and marketing automation. Over the past six or seven years, I&#8217;ve been thumped on the back of the ear with data management …]]></description>
			<content:encoded><![CDATA[<p><a title="Steve Woods Twitter" href="http://bit.ly/SCT2U" target="_blank">Steve Woods</a> of <a title="Eloqua" href="http://www.eloqua.com" target="_blank">Eloqua</a> invited me to write a guest post on his <a title="Digital Body Language" href="http://digitalbodylanguage.blogspot.com" target="_blank">Digital Body Language blog</a> after we&#8217;d gone back and forth a bit about contact data management and marketing automation. Over the past six or seven years, I&#8217;ve been thumped on the back of the ear with data management issues again and again. It always hurts, and, by the time I&#8217;ve realize I&#8217;ve got a mess&#8230;it&#8217;s a heckuva challenge to recover.</p>
<p>In my <a title="NCOA and CASS" href="http://www.gilliganondata.com/index.php/2008/12/14/ncoa-cass-aka-my-new-job/">current job</a>, I&#8217;m a full-time customer data management guy. It is <em>not</em> sexy. Like many large companies, we&#8217;ve got customer data that is created and managed in a wide range of disparate systems on diverse platforms, each with multiple decades of system evolution. It&#8217;s important. It&#8217;s painful.</p>
<p>There are some great opportunities in our increasingly electronic and e-based world to make some real headway with data management. In the case of the guest blog post, I focussed on opportunities to use marketing automation tools and your web site to drive improvements in the quality of your customer data. As for how exactly I made the &#8220;high quality mattress&#8221; analogy? Click on over and <a title="Data Management Is As Sexy As a High Quality Mattress" href="http://bit.ly/FK3y5" target="_blank">check out the post</a>!</p>
<hr />
<p><small>&copy; Tim for <a href="http://www.gilliganondata.com">Gilligan on Data by Tim Wilson</a>, 2009. |
<a href="http://www.gilliganondata.com/index.php/2009/07/01/data-management-as-sexy-as-a-high-quality-mattress/">Permalink</a> |
<a href="http://www.gilliganondata.com/index.php/2009/07/01/data-management-as-sexy-as-a-high-quality-mattress/#comments">One comment</a> |
Add to
<a href="http://del.icio.us/post?url=http://www.gilliganondata.com/index.php/2009/07/01/data-management-as-sexy-as-a-high-quality-mattress/&amp;title=Data Management &#8212; As Sexy As a High Quality Mattress">del.icio.us</a>
<br/>
Post tags: <a href="http://www.gilliganondata.com/index.php/tag/eloqua/" rel="tag">Eloqua</a>, <a href="http://www.gilliganondata.com/index.php/tag/marketing-automation/" rel="tag">marketing automation</a>, <a href="http://www.gilliganondata.com/index.php/tag/mattress/" rel="tag">mattress</a>, <a href="http://www.gilliganondata.com/index.php/tag/steve-woods/" rel="tag">Steve Woods</a><br/>
</small></p>
<p><small>Feed enhanced by <a href='http://planetozh.com/blog/my-projects/wordpress-plugin-better-feed-rss/'>Better Feed</a> from  <a href='http://planetozh.com/blog/'>Ozh</a></small></p>
]]></content:encoded>
			<wfw:commentRss>http://www.gilliganondata.com/index.php/2009/07/01/data-management-as-sexy-as-a-high-quality-mattress/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Teeter-Totter of Customer Data Management</title>
		<link>http://www.gilliganondata.com/index.php/2009/05/18/the-teeter-totter-of-customer-data-management/</link>
		<comments>http://www.gilliganondata.com/index.php/2009/05/18/the-teeter-totter-of-customer-data-management/#comments</comments>
		<pubDate>Mon, 18 May 2009 16:05:28 +0000</pubDate>
		<dc:creator>Tim Wilson</dc:creator>
				<category><![CDATA[Data Management]]></category>
		<category><![CDATA[cleansing]]></category>
		<category><![CDATA[teeter-totter]]></category>

		<guid isPermaLink="false">http://www.gilliganondata.com/?p=290</guid>
		<description><![CDATA[I had a professor in business school who used to explain the relationship between the stock market and the bond market as a teeter-totter (in rural southeast Texas, I grew up knowing this as a see-saw): as the yields on one went up, the yields on the other went down …]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.flickr.com/photos/jhritz/2800659404/"><img style="border: 0pt none; float:left;  padding-right:10px; padding-bottom:10px" title="Teeter-totter" src="http://www.gilliganondata.com/wp-content/uploads/2009/04/teetertotter.jpg" alt="Teeter-totter" width="180" height="240" /></a></p>
<p>I had a professor in business school who used to explain the relationship between the stock market and the bond market as a teeter-totter (in rural southeast Texas, I grew up knowing this as a see-saw): as the yields on one went up, the yields on the other went down and vice versa. </p>
<p>Managing your customer data can be like that, too &#8212; the more of a burden you put on your customers and prospects to keep your data about them clean, the less of a burden you put on yourself. And, likewise, the more of a burden you take on yourself, the less of a burden you&#8217;re putting on your customer.</p>
<p>While bouncing through links from a <a title="Steve Woods Tweet" href="http://twitter.com/stevewoods/status/1541990393">tweet</a>, I stumbled across Steve Woods&#8217;s original <a title="Contact Washing Machine" href="http://digitalbodylanguage.blogspot.com/2008/12/contact-washing-machine.html">Contact Washing Machine</a> post, and it set some alarm bells off. Steve&#8217;s a damn sharp guy &#8212; he was a co-founder and remains the CTO of <a title="Eloqua" href="http://www.eloqua.com">Eloqua</a>, and he is pretty much an undisputed visionary when it comes to marketing automation technology. Yet, this post sparked an immediate reaction, as well as teeter-totter imagery. Since then, Steve has clarified&#8230;and I think I misread his initial premise. His point is that data cleansing should happen as early in the data acquisition process as possible &#8212; cleanse the data as it comes in, rather than crossing your fingers and waiting to run batch processes after the fact in the hopes that the data will get cleaned up.</p>
<p>That&#8217;s a valid point, but, after digging deeper into the cross-links in the post, I still think there&#8217;s some under-estimating of what it takes to &#8220;fix&#8221; dirty data as it comes in. For starters, when it comes to customer/prospect data, there are typically a range of incoming data entry points:</p>
<p><strong>Web Data Entry</strong></p>
<p>In the world o&#8217; the web, data can come into your systems directly as typed by a visitor to your site &#8212; when a user is filling out a web form, for instance. On the surface, that&#8217;s a <em>great</em> place to do data validation, because you&#8217;ve got the actual user <em>right there</em> to clarify anything that has gone amiss. If he&#8217;s fat-fingered his phone number or put in an e-mail address that is clearly not valid, it&#8217;s best to prompt him right then and there to correct the mistake. But, the teeter-totter comes into play: if that piece of data is really not germaine (as perceived by the user), it doesn&#8217;t take long for your cleansing to lead to a frustrated visitor to your. Worse, if you don&#8217;t allow the user to bypass the validation step (with a &#8220;I don&#8217;t care what you think, I&#8217;ve entered the information correctly, so just keep it that way and let me move on&#8221; option), there is a very good chance that you will keep some visitors from ever getting to where they and you want them to!</p>
<p style="background-color: #edf5fa;  text-align: center;  border-color: black;  border-style: solid; border-width: thin;  padding: 15px;"><strong>If you include field validation on your web forms, and if you don&#8217;t allow the user to override that validation, it behooves you to include detailed form abandonment tracking in your web analytics to make sure you haven&#8217;t set up an insurmountable barrier for some of your customers.</strong></p>
<p><strong>Human Data Entry</strong></p>
<p>Call centers almost always serve a data entry function as part of the customer service process. In addition, many companies have dedicated data entry staff to translate mail, fax, tradeshow-collected leads, or other transactions. This can be a great opportunity to clean your data up front, as you can certainly place a higher burden of getting the data right and enforced data validation on employees of your own company than you can on your customers and prospects.</p>
<p><strong>BUT</strong>, this turns out to be a stickier wicket than it seems at first blush. If I had a nickel for every time I heard someone living in world of backend data propose data augmentation or enhancement by updating the human data entry processes to &#8220;just add one more quick step,&#8221; I&#8217;d be able to buy a <a href="http://www.starbucks.com/retail/nutrition_beverage_detail.asp?selProducts={0B4CE16F-937B-432E-AB3E-9831CB0B539D}&amp;x=11&amp;y=10&amp;strAction=GETDEFAULT" target="_blank">Starbucks Venti Caramel Frapuccino<sup>®</sup> blended coffee</a> (which is a lot of nickels, if you think about it). Two reasons that there should be a proceed-with-<em>extreme</em>-caution label placed prominently on any solution that heads down this path:</p>
<ul>
<li>Call centers typically live and die by the average handle time (AHT) for their calls; yes, they want to meet the customer&#8217;s needs, but they also, out of necessity, can save big dollars by cutting the AHT by a few seconds on average. Adding 5 or 10 seconds to every call can have a very real impact (and can make you some quick enemies with call center managers)</li>
<li>It&#8217;s easy to identify the benefits of more, more complete, or cleaner data&#8230;when it comes to backend processes and data analysis. But, is that benefit readily evident to the people whom you&#8217;re relying on to capture it? Does it benefit them directly, either through smoothing the immediate next steps in their process <em>or</em> by impacting their compensation? Due to the high-volume nature of call center and data entry work, data that is &#8220;just another field you need to fill out&#8221; is data that is at risk of falling prey to shortcuts (the first value in the dropdown, &#8220;aaa&#8221; in a text field, etc.). The most successful introductions of process changes have a net-no-change or net decrease in the number of steps/time/complexity of the process into which it is being introduced.</li>
</ul>
<p style="background-color: #edf5fa;  text-align: center;  border-color: black;  border-style: solid; border-width: thin;  padding: 15px;"><strong>Human data entry offers opportunities to get data that is more complete and cleaner&#8230;but those opportunities don&#8217;t come automatically.</strong></p>
<p>There are many other ways that data can enter your systems: provided by an intermediary (often semi-independent sales channels: distributors, resellers, etc.), sourced from a third-party lead sourcing company, passed in from another system within your company (often a system that doesn&#8217;t store the data in the same format or even have the same definitions for what specific fields mean and are used for), etc. There&#8217;s value in inspecting the sources of your customer data, assessing how clean the data is that comes from those different sources, and then, with the teeter-totter firmly in mind, investigating where and how to get that data coming in cleaner!</p>
<p style="text-align: right; "><em>Photo courtesy of </em><a title="jhirtz on Flickr" href="http://www.flickr.com/photos/jhritz/"><em>jhirtz</em></a><em>.</em></p>
<hr />
<p><small>&copy; Tim for <a href="http://www.gilliganondata.com">Gilligan on Data by Tim Wilson</a>, 2009. |
<a href="http://www.gilliganondata.com/index.php/2009/05/18/the-teeter-totter-of-customer-data-management/">Permalink</a> |
<a href="http://www.gilliganondata.com/index.php/2009/05/18/the-teeter-totter-of-customer-data-management/#comments">2 comments</a> |
Add to
<a href="http://del.icio.us/post?url=http://www.gilliganondata.com/index.php/2009/05/18/the-teeter-totter-of-customer-data-management/&amp;title=The Teeter-Totter of Customer Data Management">del.icio.us</a>
<br/>
Post tags: <a href="http://www.gilliganondata.com/index.php/tag/cleansing/" rel="tag">cleansing</a>, <a href="http://www.gilliganondata.com/index.php/tag/teeter-totter/" rel="tag">teeter-totter</a><br/>
</small></p>
<p><small>Feed enhanced by <a href='http://planetozh.com/blog/my-projects/wordpress-plugin-better-feed-rss/'>Better Feed</a> from  <a href='http://planetozh.com/blog/'>Ozh</a></small></p>
]]></content:encoded>
			<wfw:commentRss>http://www.gilliganondata.com/index.php/2009/05/18/the-teeter-totter-of-customer-data-management/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>40 Million Reasons Your Customer Data Isn&#8217;t As Current as You Think (or Hope)</title>
		<link>http://www.gilliganondata.com/index.php/2009/04/03/40-million-reasons-your-customer-data-isnt-as-current-as-you-think-or-hope/</link>
		<comments>http://www.gilliganondata.com/index.php/2009/04/03/40-million-reasons-your-customer-data-isnt-as-current-as-you-think-or-hope/#comments</comments>
		<pubDate>Sat, 04 Apr 2009 01:09:44 +0000</pubDate>
		<dc:creator>Tim Wilson</dc:creator>
				<category><![CDATA[Data Management]]></category>
		<category><![CDATA[change of address]]></category>
		<category><![CDATA[population]]></category>

		<guid isPermaLink="false">http://www.gilliganondata.com/?p=266</guid>
		<description><![CDATA[While not getting as much buzz as social media when it comes to hot topics for in 2009, &#8220;customer data management&#8221; is something that marketers are starting to take seriously. It&#8217;s easy to start envisioning fancy pictures of capturing and using customer data: Using behavioral data to drive timely and …]]></description>
			<content:encoded><![CDATA[<p>While not getting as much buzz as social media when it comes to hot topics for in 2009, &#8220;customer data management&#8221; is something that marketers are starting to take seriously. It&#8217;s easy to start envisioning fancy pictures of capturing and using customer data:</p>
<ul>
<li>Using behavioral data to drive timely and relevant emails</li>
<li>Integrating information across different customer touchpoints/channels to deduce customers&#8217; and prospects&#8217; preferred communications medium</li>
<li>Building analytic models to predict which customers are most likely to churn and making special offers to retain them</li>
</ul>
<p>Those are all admirable goals. And, they&#8217;re all attainable. AND, they&#8217;re all going to be expected baseline capabilities within five years.</p>
<p>Before you tackle these higher order applications, it&#8217;s worth grounding yourself in an understanding of how rapidly customer data decays. Here are a couple of fun facts to wrap your head around on that front:</p>
<ul>
<li>The U.S. Postal Service processes over <strong>40 million</strong> address changes annually <a href="http://www.usps.com/ncsc/addressservices/moveupdate/changeaddress.htm" target="_blank">[source]</a></li>
<li>The population of the United States is estimated as being just north of <strong>300 million</strong> <a title="U.S. Population" href="http://www.census.gov/population/www/popclockus.html" target="_blank">[source]</a></li>
</ul>
<p>Clearly, this isn&#8217;t an apples-to-apples comparison. But, we tend to imagine that our customers and prospects are more static than, in reality, they are &#8212; who they work for, what their job title is, and, yes, even where they live.</p>
<hr />
<p><small>&copy; Tim for <a href="http://www.gilliganondata.com">Gilligan on Data by Tim Wilson</a>, 2009. |
<a href="http://www.gilliganondata.com/index.php/2009/04/03/40-million-reasons-your-customer-data-isnt-as-current-as-you-think-or-hope/">Permalink</a> |
<a href="http://www.gilliganondata.com/index.php/2009/04/03/40-million-reasons-your-customer-data-isnt-as-current-as-you-think-or-hope/#comments">2 comments</a> |
Add to
<a href="http://del.icio.us/post?url=http://www.gilliganondata.com/index.php/2009/04/03/40-million-reasons-your-customer-data-isnt-as-current-as-you-think-or-hope/&amp;title=40 Million Reasons Your Customer Data Isn&#8217;t As Current as You Think (or Hope)">del.icio.us</a>
<br/>
Post tags: <a href="http://www.gilliganondata.com/index.php/tag/change-of-address/" rel="tag">change of address</a>, <a href="http://www.gilliganondata.com/index.php/tag/population/" rel="tag">population</a><br/>
</small></p>
<p><small>Feed enhanced by <a href='http://planetozh.com/blog/my-projects/wordpress-plugin-better-feed-rss/'>Better Feed</a> from  <a href='http://planetozh.com/blog/'>Ozh</a></small></p>
]]></content:encoded>
			<wfw:commentRss>http://www.gilliganondata.com/index.php/2009/04/03/40-million-reasons-your-customer-data-isnt-as-current-as-you-think-or-hope/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

