Welcome to Gilligan on Data, where you will find thoughts, musings, and, hopefully, not too many redundancies on the world of business data. If you missed the irony in the previous sentence, you may struggle with my writing style.


Tiger Woods won his 78th career PGA event on Sunday at The Players Championship. The commentators were tireless in their mentions of the fact that his was Woods’s 300th PGA event start.

I’m a bad golfer and a worse baseball player, but I found myself wanting to combine the two sports by calculating Woods’s “batting average” for PGA tour events. This required two major definitional leaps:

  • An “at bat” was a tournament
  • A “hit” was a win

This is a whopper of a stretch, I realize, but stick with me, anyway. :-)

The batting average math is now simply: with Woods’s win, his career batting average in tour events was 78/300, or .260! In baseball, a “good” hitter bats over .300. Of course, for my definitions to hold up, in real baseball, a player would only get credited with a hit if he hit a game-winning walkoff home run every time he got a hit!

This led me to wonder what Woods’s batting average over his career to date has been. So, using data from Woods’ profile on pgatour.com, I plotted it out (even though Woods was an amateur until 1996, the tournaments he played in before that still counted as PGA tour starts):

Tiger Woods Cumulative Win Percentage

His batting average peaked in 2009, just a couple of months before he had his worst Thanksgiving ever.

As the end of the chart shows, it does look like he is on his way back. Keep in mind that, like a real batting average, the fewer tournaments he’d played in, the more a win would increase his cumulative average and the less a non-win would drop it. That’s one reason that, in baseball, there is more focus on the batting average for the season than on the career batting average.

So, that got me wondering how this tour season compares to Woods’s past seasons. The gray in the chart below shows his average as of the end of each season:

Tiger Woods Cumulative and Yearly Win Percentage

To date, this is his highest win percentage of any year other than 2008, which was severely shortened by a knee injury. In 2008, he won 4 out of 6 PGA events before his season ended. In 2013, he has won 4 out of 7 so far!

Idle fun with Excel and online data!

 

 


I’m chunking up my reflections on last month’s eMetrics conference in San Francisco into several posts. I had a list of eight possible topics, and this is the fourth and (probably) final one that I’ll actually get to.

I’ve attended the “privacy” session at a number of recent eMetrics, and the San Francisco one represented a big step forward in terms of specificity. “Privacy” seems to be a powerful word in the #measure industry — it’s a single word that seems to magically turn many people and companies into ostriches! It’s not that we want to avoid the topic, but there is so much complexity and uncertainty that putting our heads in the sand and kicking the can down the road (everyone loves a good mixed metaphor, right?) seems to be the default course of action.

In the session sardonically titled “Attend this Session or Pay €1 Million,” René Dechamps Otamendi of Mind Your Privacy covered European privacy regulations and Joanne McNabb of the California Department of Justice covered California and US privacy regulations.

When Pop Culture Picks It Up…

I was a West Wing fan, but had no memory of this clip that René shared:

When you’ve got mainstream network television referencing a topic, it’s a topic that is at least on the periphery of the mainstream.

“Fundamental Right” vs. “Business/Consumer Negotiation”

René pointed out that many Americans miss the point when it comes to the European privacy regulations — in typical America-centric fashion, we ignore history. We see privacy as a topic that is up for debate — how do we protect consumers with minimal regulation so that businesses can capitalize on as much personal data as possible.

In Europe…there was the Holocaust. René described how, in The Netherlands prior to WWII, the  government maintained detailed and accurate records on every citizen. When the Nazis invaded, this data made it very easy for them to identify and persecute Jews. Of the 140,000 Jews who lived in The Netherlands prior to 1940, only 30,000 survived the war, and historians point to the availability of this data as one of the main reasons for this. Yikes! For many Europeans, this sort of history is both deeply embedded and strongly linked to the topic of personal and online privacy.

Thinking of privacy as an undisputed as a fundamental right is somewhat eye-opening.

It Doesn’t Matter Where Your Company Is Based

This isn’t exactly news, but it seems to be one of the excuses marketers use for burying their heads in the sand: “We’re based in Ohio — not California or Europe. So, how much do we have to worry about privacy regulations there?”

The answer comes down to where your customers are. The European Directive, as well as California regulations, do not care where a company is based. They’re focused on where the consumers interacting with those companies are. Pull up your visitor geography reports in your web analytics platform and look at where your traffic is coming from — anywhere that has a non-miniscule percentage of traffic is likely somewhere that you need to understand privacy-regulation-wise.

Why California instead of “the U.S.?”

Joanne pointed out that California is clearly in the forefront when it comes to developing, implementing, and enforcing privacy regulations in the U.S. The California Online Protection and Privacy Act (CalOPPA) has been in effect since 2004 (although not widely understood for the first few years). That’s closing in on a decade!

To me, this sounded a lot like fuel economy standards in the auto industry — California is a large enough market that businesses can’t afford to ignore the state’s residents. At the same time, other states, and the federal government (because the U.S. has a long — and checkered — history of using the states as laboratories for testing ideas), are watching California to see what they figure out. There is a very good chance that what works for California will be a basis for other states and for federal regulations.

Is California the Same As Europe?

Yes and no. They’re the same in that they have a similar orientation towards “individuals’ rights.” They’re the same in that they are increasingly starting to enforce their regulations (with very real fines levied on companies).

They’re different…in that the U.S. and Europe are different — both culturally and structurally.

They follow developments in each others’ worlds, but they’re not actively marching towards a single, unified regulation.

So, Where Should Companies Start?

Step 1: Check your privacy policy. Really. Read it. Read it for your country-specific sites (simply translating your U.S. privacy policy into German doesn’t work!). If you give it a really close read, are you even complying with what you say you are?

Step 2: Learn some details. For Europe, reach out to René at the email address in the image below. He’s got a document that explains the ins and outs of EU privacy regulations (if the number “27″ doesn’t mean anything to you, you haven’t learned enough):

euprivacy27dpas Rene's email

For California, one resource is the California Attorney General’s site for online privacy. Unfortunately, it is a bureaucratically built site, so be ready for some heavy document-wading.

Step 3: Educate your company. This one is no small task, because, when asked who to include in that discussion, it seemed like a simpler answer would have come if the question was who not to include. The web team, marketing, legal, and IT are a good start. The best hook is “We could be fined 1,000,000 euros…”

In Short: It’s Still Messy, but Things Are Getting Clearer

The heading says it all. “We” all need to take our heads out of the sand and get smarter on this. If a regulatory agency comes calling, the worst response is, “Tell me who you are again?” The best (but not currently possible) response is, “We’re totally compliant.” A good response is, “We’re working on it, here’s what we’ve done, and here’s our roadmap to do more.”


I’m chunking up my reflections on last week’s eMetrics conference in San Francisco into several posts. I’ve got a list of eight possible topics, but I seriously doubt I’ll managed to cover all of them.

On Tuesday, I attended Ian Lurie’s presentation: “Data That Persuades: How to Prove Your Point.” This session was a “fist pumper” for me, as Ian is as frustrated by crappy data visualization as I am (he led off the presentation by showing a mouth guard, sharing that he wears one at night because he grinds his teeth, and then noting that the stress of seeing data poorly presented was a big source of the stress driving that grinding!).

One of the ways Ian illustrated the importance of putting care into the way data gets presented was with this image:

Read, React, Respond

think it’s fair to say this a representation of the three types of memory:

  • The “lizard brain” represents iconic memory — the “visual sensory register.” It’s where preattentive cognitive processing occurs. If we don’t put something forth that is clear and instantaneously perceptible, then the information won’t get past the lizard brain.
  • The “ape brain” represents short-term memory — where conscious thought and basic processing occurs. The initial, “Do I care?” question gets asked and answered.
  • The “human brain” represents longer-term memory — where we actually need to digest the information and develop and implement a response.

Ian also spent a lot of time on Tufte’s data-ink ratio — imploring the audience to be heavily reductionist in the visualization of data by removing extraneous words, lines, tick marks, etc. so that “the data” really comes through.

Otherwise, the recipients of the data will be like screaming goats:

Screaming Goat


I’m chunking up my reflections on last week’s eMetrics conference in San Francisco into several posts. I’ve got a list of eight possible topics, but I seriously doubt I’ll managed to cover all of them.

The closing keynote at eMetrics was Matt Wilson and Andrew Janis talking about how they’ve been evolving the role of digital (including social) analytics at General Mills.

Almost as a throwaway aside, Matt noted that one of the ways he has gone about increasing the use of their web analytics platform by internal users is with video:

  1. He keeps a running list of common use cases (types of data requests)
  2. He periodically makes 2-minute (or less) videos of how to complete these use cases

Specifically:

  • He uses Snagit Pro to do a video capture of his screen while he records a voiceover
  • If a video lasts more than 120 seconds, he scraps it and starts over

Outside of basic screen caps with annotations, the “video with a voiceover” is my favorite use of Snagit. When I need to “show several people what is happening,” it’s a lot more efficient than trying to find a time for everyone to jump into GoToMeeting or a Google Hangout. I just record my screen with my voiceover, push the resulting video to YouTube (in a non-public way — usually “anyone with the link” mode), and shoot off an email.

I’ve never tried this with analytics demos — as a way to efficiently build a catalog of accessible tutorials — but I suspect I’m going to start!


I’m chunking up my reflections on last week’s eMetrics conference in San Francisco into several posts. I’ve got a list of eight possible topics, but I seriously doubt I’ll managed to cover all of them.

One of the first sessions I attended at last week’s eMetrics was Jim Novo’s session titled “The Evolution of an Attribution Resolution.” We’ll (maybe) get to the “attribution” piece in a separate post (because Jim turned on a light bulb for me there), but, for now, we’ll set that aside and focus on a sub-theme of his talk.

Later at the conference, Jennifer Veesenmeyer from Merkle hooked me up with a teaser copy of an upcoming book that she co-authored with others at Merkle called It Only Looks Like Magic: The Power of Big Data and Customer-Centric Digital Analytics. (It wasn’t like I got some sort of super-special hookup. They had a table set up in the exhibit hall and were handing copies out to anyone who was interested. But I still made Jennifer sign my copy!) Due to timing and (lack of) internet availability on one of the legs of my trip, I managed to read the book before landing back in Columbus.

A Long-Coming Shift Is About to Hit

We’ve been talking about being “customer-centric” for years. It seems like eons, really. But, almost always, when I’ve hear marketers bandy about the phrase, they mean, “We need to stop thinking about ‘our campaigns’ and ‘our site’ and ‘our content’ and, instead, start focusing on the customer’s needs, interests, and experiences.” That’s all well and good. Lots of marketers still struggle to actually do this, but it’s a good start.

What I took away from Jim’s points, the book, and a number of experiences with clients over the past couple of years is this:

Customer-centricity can be made much more tangible…and much more tactically applicable when it comes to effective and business-impacting analytics.

This post covers a lot of concepts that, I think, are all different sides of the same coin.

Visitors Trump Visits

Cross-session tracking matters. A visitor who did nothing of apparent importance on their first visit to the site may do nothing of apparent importance across multiple visits over multiple weeks or months. But…that doesn’t mean what they do and when they do it isn’t leading to something of high value to the company.

Caveat (defended) to that:

Visitors Trump Visits

Does this means visits are dead? No. Really, unless you’re prepared to answer every new analytics question with, “I’ll have an answer in 3-6 months once I see how visitors play out,” you still need to look at intra-session results.

When I asked Jim about this, his response totally made sense. Paraphrasing heavily: “Answering a question with a visit-driven response is fine. But, if there’s a chance that things may play out differently from a visitor view, make sure you check back in later and see if your analysis still holds over the longer term.”

Cohort Analysis

Cohort analysis is nothing more than a visitor-based segment. Now, a crap-ton of marketers have been smoking the Lean Startup Hookah Pipe, and, in the feel-good haze that filled the room, have gotten pretty enamored with the concept. Many analysts, myself included, have asked, “Isn’t that just a cross-session segment?” But “cross-session segment” isn’t nearly as fun to say.

Cohort Analysis Tweet

Here’s the deal with cohort analysis:

  • It is nothing more than an analysis based around segments that span multiple sessions
  • It’s a visitor-based concept
  • It’s something that we should be doing more (because it’s more customer-centric!)

The problem? Mainstream web analytics tools capture visitors cross-session, and they report cross-session “unique visitors,” but this is only in aggregate. You can dig into Adobe Discover to get cross-session detail, or, I imagine, into Adobe Insight, but that is unsatisfactory. Google has been hinting that this is a fundamental pivot they’re making — to get more foundationally visitor-based in their interface. But, Jim asked the same question many analysts are:

Visitor Value Prediction

Having started using and recommending visitor-scope custom variables more and more often, I’m starting to salivate at the prospect of “visitor” criteria coming to GA segments!

Surely, You’ve Heard of “Customer Lifetime Value?”

“Customer Lifetime Value” is another topic that gets tossed around with reckless abandon. Successful retailers, actually, have tackled the data challenges behind this for years. Both Jim and the Merkle book brought the concept back to the forefront of my brain.

It’s part and parcel to everything else in this post: getting beyond, “What value did you (the customer) deliver to me today?” to “What value have you (or will you) deliver to me over the entire duration of our relationship” (with an eye to the time value of money so that we’re not just “hoping for a payoff wayyyy down the road” and congratulating ourselves on a win every time we get an eyeball).

Digital data is actually becoming more “lifetime-capable:”

  • Web traffic — web analytics platforms are evolving to be more visitor-based than visit-based, enabling cross-session tracking and analysis
  • Social media — we may not know much about a user (see the next section), but, on Twitter, we can watch a username’s activity over time, and even the most locked down Facebook account still exposes a Facebook ID (and, I think, a name)…which also allows tracking (available/public) behavior over time
  • Mobile — mobile devices have a fixed ID. There are privacy concerns (and regulations) with using this to actually track a user over time, but the data is there. So, with appropriate permissions, the trick is just handling the handoff when a user replaces their device

Intriguing, no?

And…Finally…Customer Data Integration

Another “something old is new again” is customer data integration — the “customer” angle of of the world of Master Data Management. In the Merkle book, the authors pointed out that the illusive “master key” that is the Achilles heel of many customer data integration efforts is getting both easier and more complicated to work around.

One obvious-once-I-read-it concept was that there are fundamentally two different classes of “user IDs:”

  • strong identifier is “specifically identifiable to a customer and is easily available for matching within the marketing database.”
  • weak identifier is “critical in linking online activity to the same user, although they cannot be used to directly identify the user.”

Cookie IDs are a great example of a weak identifier. As is a Twitter username. And a Facebook user ID.

The idea here is that a sophisticated map of IDs — strong identifiers augmented with a slew of weak identifiers — starts to get us to a much richer view of “the customer.” It holds the promise of enabling us to be more customer-centric. As an example:

  • An email or marketing automation system has a strong identifier for each user
  • Those platforms can attach a subscriber ID to every link back to the site in the emails they send
  • That subscriber ID can be picked up by the web analytics platform (as a weak identifier) and linked to the visitor ID (cookie-based — also a weak identifier)
  • Now, you have the ability to link the email database to on-site visitor behavior

This example is not a new concept by any means. But, in  my experience, the way each of the platforms involved in a scenario like this has preferred to work is that they set their own strong and weak identifiers. What I took away from the Merkle book is that we’re getting a lot closer to being able to have those identifiers flow between systems.

Again…privacy concerns cannot be ignored. They have to be faced head on, and permission has to be granted where permission would be expected.

Lotta’ Buzzwords…All the Same Thing?

Nothing in this post is really “new.” They’re not even “new to me.” The dots I hadn’t connected was that they are all largely the same thing.

That, I think, is exciting!

 


I keep posting things elsewhere and forgetting to get a post here to reference them.

Last fall, I pitched a session topic to Jim Sterne for the eMetrics conference that occurred last week. At the time, I was just a few weeks into my job at Clearhead, and I figured that, by April 2013, I’d easily have a fully baked, deliverable-supporting process that I could use as the basis for the session.

You’re expecting this sentence — the one following that last paragraph — to say, “Boy…was I wrong!” The fact is…I was mostly right!

A handful of articles, posts, and content all came out of my effort to get spit and polish on the material in time for the session:

Lots of content. You be the judge if it’s good content. Or, if you’re reading this shortly after it got posted and you’re in central Ohio, come get an abbreviated version at this month’s Columbus Web Analytics Wednesday.