Reporting Tools Can’t Fix Bad Data

Wednesday, March 10th, 2010 by tgwilson_php 1 Comment

Stephen Few wrote a brilliant (and rather scathing) post recently: Big BI is Stuck: Illustrated by SAP BusinessObjects Explorer. In the post, he extensively quotes marketingspeak from various SAP executives and then picks apart their claims. He follows that part of the post with excerpts from a review of their new BusinessObjects Explorer that highlights that, (alas!), the tool is not the killer app that makes access to all data easy and intuitive. Of course, SAP is by no means the first company to underdeliver on that promise!

The quote that really jumped out for me in the post, though, was this one:

Anyone who understands BI, however, knows that no interface, no matter how magical, will give you access to data that isn’t available, will clean data that is dirty, or will simplify the navigation of complicated operational databases.

That quote alone warranted a mini-blog post, if for no other reason than to allow me to quickly get my hands on it in the future when I’m in the midst of bashing my head against a cinder block wall of requests for crisp, clean analytic insights from a messy, messy world of data.

Mr. Few, I already had you placed high on a pedestal. Please accept this footstool upon which you can perch to be raised up just a little bit higher thanks to the clarity and insight within that one sentence you have penned!

Web Analytics Tracking on a Facebook Page

Monday, March 1st, 2010 by Tim Wilson 5 Comments

I’ve been on a quest now for several months to crack the code of how to get web analytics tracking on a Facebook fan page. My (and our clients’) desire to do so shines an interesting light on the way that social media has blurred the concept of a “web site.” Back in the day, it was pretty simple to identify what pages you wanted to track: if the user perceived the page as being part of your site, you wanted to track that page with your web analytics software (even if it was an area of your site that was hosted by some other third party that had specialized capabilities like managing job opening, events, or discussion forums).

Social media, and Facebook in particular, is starting to blur those lines. If your company manages a branded fan page on Facebook, and that page is a place through which your customers and target customers actively engage with the brand, isn’t it acting a bit like your web site? Clearly, a Facebook page is not part of your site, but it’s a place on the web where consumers actively engage with brands, both to give and receive brand-related content. It acts a lot like a traditional web site in that regard.

As companies begin to invest more heavily in Facebook pages — both through creative development and staff to engage with consumers who interact with their brand through a fan page — there is an increasing need to have better visibility into activity on those pages. I wrote an entire post on the subject of Facebook measurement back in January, and I’ve had to update it several times since then as Facebook has rolled out changes and as I’ve gotten a bit deeper into the web analytics aspects of that tracking.

Just last week, Webtrends announced some damn slick enhancements to Analytics 9 that allow not only tracking well beyond what Facebook Insights offers, but that also brings in some specific (anonymous) user information so that the traffic can be segmented in useful ways (the post on mashable.com shows some screen captures of the resulting data). I fully expect that Omniture will come out with something comparable as soon as they can, but I don’t think they have that level of tracking yet (if you know differently, please leave a comment to let me know). [Update: Coremetrics announced some new Facebook tracking capabilities shortly after this post was published.]. My one concern with the Webtrends solution is that, as best as I can tell, it requires the tracked pages to use a Facebook application that will pop up an “Allow Access?” question to the user — the user has to indicate this is okay before getting to the content on the page. Lots of applications have this, but, at Resource Interactive, we’ve also had lots of clients for whom we have built very rich and interactive experiences on their fan pages…without requiring anything of the sort. If the access is needed to enable the application to deliver value to the user, then this is fine, and the improved trackability is just scrumptious gravy that comes along for the ride. If the access is needed just for tracking, then I would have to think long and hard about it — data capture should always be between somewhere between excruciatingly minimally visible to the user and not visible at all.

The question, then, is, “What can be tracked unobtrusively, and how can it be done?” This post will attempt to answer that question.

Why Is It So Tricky in the First Place?

Facebook, largely for privacy reasons, locks down what can happen on its pages. It may make your head hurt (it certainly makes mine) to understand all of the cans vs. cannots for different scenarios, but I’ll take a crack at a short list. There are two basic scenarios that a customer might experience as a “tab on a brand’s page:”

  • The brand can add a tab to the page and drop some form of Facebook application into it; in this scenario, iFrames are not allowed, and Javascript cannot be executed
  • The brand can make a separate application, and, on the “application canvas,” they can drop an iFrame, and Javascript can be executed within that iFrame; but, since the application canvas cannot exist “in a tab,” the design for the page has to include tabs to mimic the fan page, which is a bit clunky and raises some other user experience challenges

Okay, so that was easy enough…assuming you’re following the custom tab / application / application canvas terminology. Both of these scenarios allow the embedding of Flash objects on the page.

Facebook doesn’t allow Javascript, but it does allow it’s own similar scripting language, called FBJS (these tabs also use “FBML” rather than HTML for developing the page — it’s similar to HTML but not identical).

What all of this means is that it’s not as simple as “just drop your web analytics page tag on the page” and you’ll get tracking. But that doesn’t mean you’re entirely SOL. This post is almost entirely geared towards custom Facebook tabs — and, really, it assumes that the content on those pages are based on an FBML application.

Tracking Basic Visits and Page Views for a Custom Tab?

We’ve cracked this to varying degrees for two different web analytics tools: Google Analytics and Webtrends. We haven’t had a pressing need to tackle it for anything else, but I’m pretty sure the same principles will apply and we’ll be able to make it happen. In both cases, the approach is pretty much the same — you need to have the FBML and FBJS on the page make an image call to the web analytics program. To pull it off, you do need to have a good understanding of how web analytics tools collect data, which I wrote an extensive post about a few days ago.

In the case of Webtrends, the simplest thing to do is treat the page like a page where every visitor who comes has Javascript disabled in their browser. I’ll cover that later in this post.

For Google Analytics, things are a little dicier because Google Analytics doesn’t have out-of-the-box “noscript” capabilities. You have to figure out all of the appropriate parameter values and then just make a full image call (again, reference the link above for a detailed explanation of what that means). You’re not going to get all of the data that you would get from running the standard page tag (which I’ll touch on a bit more later in this post), but you can certainly get page views and unique page views with a little FBJS work.

Start out by creating a new Google Analytics UA number for your Facebook tracking. This will give you a profile with a new ID of the form: UA-XXXXX-YY. You will have to provide a domain name, but what that domain name is is immaterial — “<brand>.facebook.com” makes sense, but it can really anything you want.

Then, it’s just a matter of figuring out the list of values that you are going to tack on as parameters to the Google Analytics image call (http://www.google-analytics.com/__utm.gif). Below are some tips on that front (refer to the Google Analytics documentation for a deeper explanation of what each parameter is), with the bolded ones being the ones that I’ll discuss in greater detail:

  • utmwv: 4.6.5 (or a newer version — I don’t think it’s critical)
  • utmn: needs to be a random number between 100000000 and 999999999 (more on this in a bit)
  • utmhn:  <brand>.facebook.com (or something else — again, not critical)
  • utmcs: leave blank
  • utmsr: leave blank
  • utmsc: leave blank
  • utmul: leave blank
  • utmje: leave blank
  • utmfl: leave blank
  • utmdt:  the title of the page (whatever you want to call it)
  • utmhid: leave blank
  • utmr: leave blank
  • utmp: a “URL” for the page
  • utmac: the Google Analytics ID you set up (UA-XXXXX-YY)
  • utmcc: __utma%3D1.<session-persistent ID>.1252000967.1252000968.1252000969.1%3B

This is as simple as it gets. Obviously, all of the “leave blanks,” as well as the limited number of “cookie values” being passed, mean that you’re not going to get nearly as rich information for the visitors to this tab (you should be able to just eliminate the “leave blank” parameters entirely from the image call. You will get page views and unique page views, and you can set up goals and funnels across tabs if you want. You can also start getting a little fancier and inserting campaign tracking parameters and other information, but start here and get the basics working first — you can always augment later (and please come back and comment here with what you figure out!).

For the four bolded parameters in the list above, two are ones that you will predefine for the tab itself — they’re essentially static — and two are ones that will require a little FBJS magic to make happen.

Let’s start with the two static ones:

  • utmdt: this is normally the <title> tag for the page that is being visited; you can make it any plain English set of text you want, but you need to replace spaces and other special characters with the appropriate URL encoding
  • utmp: this is the URL for the page; you certainly can navigate to the custom tab in Facebook and use that, but I suggest just making it a faux URL, similar to how you would name a virtual pageview when doing onclick tracking; again, you will need to make this an appropriately URL encoded value (that mainly means replacing each “/” in the URL you come up with with “%2F”)

The two other values require a little more doing, although it’s apparently pretty straightforward with FBJS (if you’re not a Javascript / FBJS jockey, as I’m not, you may need to track down a willing collaborator who is):

  • utmn: the sole purpose of this value is to make the overall GIF request a “new” URL; it’s a random (or, at least, quasi-random) number between 100000000 and 999999999 that should change every time there is a new load of the page
  • utmcc: the main thing you want to do here is generate a value between 1000000000000000000 and 9999999999999999999 that will stay with the visitor throughout his visit to Facebook. The other values in the __utma subparameter of utmcc are various date-stamps; if you want to get fancy, you can try to populate some of those as well; overall, utmcc is supposed to be a set of cookie values that persists on the user’s machine — we’re not actually dropping a cookie here, which means we’re not going to be able to track any of the sorts of “lifetime unique visitors”-dependent measures within Google Analytics (that includes “new vs. returning” visitors — everyone’s going to look like a new visitor in your reporting)

Make sense? I built a spreadsheet that would concatenate values I’d populated for these variables which just isn’t pretty enough to share. But, you just need to tack all of these values together as I described in my my last post and drop that as an image call on your custom tab.

This won’t work for every tab — you can’t do it on your wall or your Info tab or other pre-defined, unformattable tabs, but if you create a new tab and drop an FBML application in it, you can go nuts with this.

[Update: At almost the exact same time that this post went live, an e-mail hit my inbox with a link to a Google Analytics on Facebook post that I failed to turn up during my research (the post is only a week old, and most of my research happened prior to that). This post includes a handy link generator which looks really promising and helpful.]

Tracking Actions within a Tab

Now, suppose you’ve got your custom tab, and you’ve got tracking to the tab working well. But, you’ve dropped some Flash objects on the tab, and you want to track interactions within Flash. You’ve got two options here:

  • Just use the Actionscript API for Google Analytics — as I understand it, this works fine; I’ve also heard, though, that this adds undue weight to the app (35 KB), and that it’s not super-reliable; but, if you or your Flash developer is already familiar with and using this approach, then knock yourself out
  • Manually generate image calls for each action you want to track — this really just means follow the exact same steps as listed in the prior section, but use Actionscript rather than FBJS for the dynamically generated pieces

Because I work with motivated developers, we went the latter route and built a portable Actionscript class to do the heavy lifting.

Presumably, you can also use FBJS to track non-Flash actions as well, depending on what makes sense.

What About Webtrends?

The same principles described above apply for Webtrends. But, Webtrends has an out-of-the-box “<noscript>” solution, so, rather than reverse-engineering the dcs.gif, you can use a call to njs.gif:

http://statse.webtrendslive.com/<your DCS ID>/njs.gif?dcsuri=/<virtual URL for the page>&WT.ti=<name for the page>

(I did confirm that you can leave off the WT.js parameter that is listed in the Webtrends documentation for using njs.gif).

It also seems like it would make sense to tack on a random number in a parameter at the end (such as “&amp;nocache=<random number>”) just to reduce the risk of caching of the image request (similar to what’s described for the utmn parameter for Google Analytics above). I haven’t even asked for confirmation that that would be useful, but it seems like it would make sense, and it’s just a parameter that Webtrends will ignore in the processing.

Chances are, you’ll want to set up a new profile in Webtrends that only includes this Facebook traffic (see my opening ramble about Facebook pages being quasi-web sites), and you’ll probably want to filter this traffic out of your various existing profiles. That may mean you need to think about how you are naming your pages to make for some easy Include and Exclude filter creation.

(Oh, yeah, and the “statse.webtrendslive.com” assumes you’re using Webtrends Ondemand — if you’re running Webtrends software, you’ll need to replace this with the appropriate domain.)

As you’ve probably deduced by now, we haven’t really vetted our “njs.gif” usage…yet, but we’ve gotten a lot of head nods from within Webtrends that this should work. I’ll update this post once I’ve got confirmation, but I wanted to go ahead and get the information published so that someone else can run with it and maybe figure it out in more detail and let me know!

Webtrends also, apparently, allows Actionscript to interact with the Webtrends REST API directly, which, allegedly, is an option for action tracking within Flash on Facebook pages. We haven’t confirmed that, and, in what little looking I did on http://developer.webtrends.com, I didn’t turn up any particularly useful documentation, so either that’s not widely in use, or I’m a lousy user of their search function.

It’s Not as Tough as It Looks…but It’s Not Perfect

This may seem a little overwhelming, but the mechanics are really pretty straightforward once you dive in and start playing with it.

To test your work, you don’t need to actually code up anything — just set up your new profiles (Google Analytics or Webtrends) build up some image request strings, and start hitting them. You can manually swap out the “dynamic” values — even have some friends or co-workers hit the URLs as well. To introduce a bit of rigor, it’s worth tracking the specific image requests you’re using, how many times you hit them and from what browser. That way you can compare the results in your web analytics tool to see if you’re getting what you’d expect. Then you can move on to actually getting the calls dropped into a Facebook page.

Realize, too, that this whole process is a dumbing down of what normally happens when Javascript or Actionscript is used to tell your web analytics tool that someone has visited the page. Your new vs. returning traffic is going to be inaccurately skewed heavily towards “new.” You’re not going to get browser or OS details (much less whether Javascript is enabled or not). But, you will get basic page views and visits/unique pageviews, and that’s something! You’re stepping back into the Bronze Age of web analytics, basically, but that’s better than the Stone Age, and you’re doing it within social media!

I suspect that you can get a little fancier with FBJS and start to get more robust measurement. As a taste of that, we actually got some tracking working on users’ walls in Facebook, which was both wicked and rad (as the cool kids in the 80s would have said):

  • We posted a status update that was, basically, an invitation to click into a Flash object; if the user clicked into it, then a Flash-based box expanded on their wall, and Google Analytics would be passed an image call to record a page view for the activity
  • We also passed in a “utmv” value, which we then used to set up segments within Google Analytics — the idea being that, each time we do one of these status updates will be a separate “campaign,” but our campaign tracking will be through custom segments within Google Analytics — that will enable all of our reporting, including the conversion funnels we set up, to be set up once and then re-used through Google Analytics segmentation

Neat, huh? Or, as we’d say in the rural Texas town where I grew up, it’s slicker than greased baby poop. This is giving us highly actionable data — enabling us to see how people are interacting with these experiences through Facebook and enabling us to try different approaches to improve conversion over time! (To be clear, we’re not capturing personally identifiable Facebook information — exactly who is interacting is still invisible to us, which is as it should be).

Fun stuff. If you’ve given anything along these lines a try (or if you’ve successfully taken a totally different tack), please leave a comment — I’d love to get other options added!

All Web Analytics Tools Are the Same (at least when it comes to data capture)

Saturday, February 27th, 2010 by Tim Wilson 7 Comments

I started to write a post on using web analytics tools — Google Analytics, specifically, but with a nod to Webtrends as well — to track traffic to custom tabs and interactive elements on Facebook pages. But, as I started thinking through that content, I realized that I needed to back up and make sure I had a good, clean explanation of a key aspect of the mechanics of page tag-based web analytics tools. I poked around on the interweb a bit and found some quick explanations that were accurate, but that really weren’t as detailed as I was hoping to find.

Regardless of whether you’re trying to track Facebook or not, it’s worth having a good, solid understanding of these underlying mechanics:

  • If you’re a web analyst, understanding this is like understanding gravity if you’re a human being — there are some immutable laws of the internet, and knowing how those laws drive the data you are seeing will open up new possibilities for capturing activity on your site
  • If you’re a developer, then this will be a quick read, but understanding it will make you the hero to both your web analysts and (assuming they’re not glory hogs) the people they support with their analysis, because you will be able to suggest some clever ways to capture useful information

By the end of this post, you should understand both the title and why the URLs I listed below are what make it so:

  • Google Analytics = http://www.google-analytics.com/__utm.gif
  • Webtrends = http://statse.webtrendslive.com/<ID>/dcs.gif
  • Sitecatalyst = https://<custom domain>/b/ss/<account name>/1/<code version>/<random ID>
  • Coremetrics = http://<custom domain>/cm or http://<custom domain>/eluminate

I’ve been deep under the hood with both Google Analytics and Webtrends for this, but the same principles apply to all tools (because they’re all bounded by the Physics of the Internet). I’m going to talk about Google Analytics the most in-depth, because it has the largest market share (measured by number of sites tagged with it), and I’ll try to call out key differences when appropriate.

Let’s start with a simple picture of how all of these tools work. When a visitor comes to a page on your site, the following sequence of events happens:

Steps 2 and 3 are really the crux of the biscuit, but we need to make sure we’re all clear on the first step, too, before getting to the fun there.

1 – Javascript figures out stuff about the visitor

We all know what Javascript is, right? It’s one of the key languages that can be interpreted by a web browser so that web pages aren’t just static text and images: dropdown menus, mouseovers, and such. But, Javascript also enables some things to go on behind the scenes. The basic data capture method for any tag-based web analytics tool is to run Javascript to determine what page the visitor is on, what relevant cookies are set on the user’s machine, whether the visitor has been to the site before, what browser the visitor is using, what language encoding is set for the browser, the user’s screen resolution, and a slew of other fairly innocuous details. This happens every time a visitor views a page running the page tag. So, great — a visitor has viewed a page, and the Javascript has figured out a bunch of details about the visitor and the page. Now what? It’s on to step 2!

(I realize I’m saying “Javascript” here, and most tools also have Actionscript support for tracking activity within Flash — for the purposes of this post, I’m just going to stick with Javascript, but I’ll get back to Actionscript in my next post!)

2 – Javascript packages that info into a single string of information

The next step is pretty simple, but it’s where the magic starts to happen. Let’s say the Javascript in step 1 had figured out the following information about a visitor to a page:

  • Site = www.gilliganondata.com
  • Page title = The Fun of Facebook Measurement
  • Page URL = /index.php/2010/01/11/the-fun-of-facebook-measurement/
  • Browser language = en-us

Converting that info into a single string is pretty straightforward. Let’s start by pretending we’re going to put it into a single row in a pipe-delimited file. It would look like this:

Site (hostname) = www.gilliganondata.com | Page name = The Fun of Facebook Measurement | Page URL = /index.php/2010/01/11/the-fun-of-facebook-measurement/ | Browser language = en-us

Now, rather than using the pretty, readable names for each of the four characteristics of the page view, let’s use some variable names (these are the Google Analytics variable names, but the documentation for any web analytics tool will provide their specific variable names for these same things):

  • Site (hostname) –> utmhn
  • Page title –> utmdt
  • Page URL –> utmp
  • Browser language –> utmul

So, now our string looks like:

utmhn = www.gilliganondata.com | utmdt = The Fun of Facebook Measurement | utmp = /index.php/2010/01/11/the-fun-of-facebook-measurement/ | utmul = en-us

We used pipes to separate out the different variables, but there’s nothing really wrong with using something different, is there? Let’s go with using “&” instead and eliminate the spaces around equal signs and the delimiters. The single string now looks like this:

utmhn=www.gilliganondata.com&utmdt=The Fun of Facebook Measurement&utmp=/index.php/2010/01/11/the-fun-of-facebook-measurement/&utmul=en-us

Now, we’ve still got some “special” characters that aren’t going to play nice in the Step 3 — namely spaces and “/”s, so let’s replace those characters with the appropriate URL encoding (%20 for the spaces and %2F for the “/”s):

utmhn=www.gilliganondata.com&utmdt=The%20Fun%20of%20Facebook%20Measurement&utmp=%2Findex.php%2F2010%2F01%2F11%2Fthe-fun-of-facebook-measurement%2F&utmul=en-us

It looks a little messy, but it’s a single, portable string that has the exact information that was listed in the four bullets that started this section. While it might be painful to reverse-engineer this string into a more reader-friendly format by hand, it’s a snap to do programmatically (which is exactly what web analytics tools do…as we’ll discuss in step 4) or in Excel.

Before we move on, let’s tack one more parameter onto our string. This is something that is actually hard-coded into the Javascript, and it identifies which web analytics account this traffic needs to go to. In the case of this blog, that account ID is “UA-2629617-3″ and the variable Google Analytics uses to identify the account parameter is “utmac.” I’ll just tack that on the end of our string, which now looks like:

utmhn=www.gilliganondata.com&utmdt=The%20Fun%20of%20Facebook%20Measurement&utmp=%2Findex.php%2F2010%2F01%2F11%2Fthe-fun-of-facebook-measurement%2F&utmul=en-us&utmac=UA-2629617-3

A subtle point: what we’ve really done above is to combine all the information into a single string with a series of “key-value pairs.” In the case of the first variable, the “key” is “utmhn” and the “value” is “www.gilliganondata.com.” Notice that both the key AND the value are included in the string. If you’ve worked with comma-delimited or tab-delimited files, then you might be wondering why the key is included. Why can’t the Javascript always pass in the variables in the same order, and the web analytics server would know that the first value is the hostname, the second value is the title, and so on? There are at least four reasons for this:

  • It just generally makes the process more robust because it reaffirms to the server exactly what each value means at the point the server receives the information; the internet is messy, so hiccups can happen
  • Most “advanced” features when it comes to capturing web analytics data rely on tacking on additional parameters to the master string — by including both the key and the value for every parameter, that fanciness doesn’t have to worry about the order the parameters are passed in, AND it means the custom parameters get viewed/processed exactly the same way that the basic parameters do
  • The “key-value pairs separated by the & sign” are standard on the internet. Go to any online retail site and poke around, and you will see them in the URL. It’s kind of a standard way to transmit a series of variables onto the back end of a web page or image request, and that’s really all that’s going to happen in step 3

We’ve got our string, so now let’s do something with it!

3 – Javascript makes an image request with that string tacked on the end

Somehow, we need to pass that string back to the web analytics server. We do that by making an image call. In the case of Google Analytics that image request is always, always, always exactly the same, no matter the site using Google Analytics:

http://www.google-analytics.com/__utm.gif

Just like we covered in the “online retail site” URL structure discussion at the end of the last section, we’re going to tack some parameters on the end of the __utm.gif request. The standard way to take a base URL and tack on parameters is to add a “?” followed by one or more key-value pairs that are separated by an “&” sign. Lucky for us, the “&” sign is what we used when we were building our string in the last section! So:

http://www.google-analytics.com/__utm.gif

+

?

+

utmhn=www.gilliganondata.com&utmdt=The%20Fun%20of%20Facebook%20Measurement&utmp=%2Findex.php%2F2010%2F01%2F11%2Fthe-fun-of-facebook-measurement%2F&utmul=en-us&utmac=UA-2629617-3

=

http://www.google-analytics.com/__utm.gif?utmhn=www.gilliganondata.com&utmdt=The%20Fun%20of%20Facebook%20Measurement&utmp=%2Findex.php%2F2010%2F01%2F11%2Fthe-fun-of-facebook-measurement%2F&utmul=en-us&utmac=UA-2629617-3

Wow, that looks messy, but it just looks messy — it’s actually quite clean! In reality, there are way more than five parameters tacked onto the image request. As a matter of fact, the request above would really look more like this:

http://www.google-analytics.com/__utm.gif?utmwv=4.6.5&utmn=1516518290&utmhn=www.gilliganondata.com&utmcs=UTF-8&utmsr=1920×1080&utmsc=24-bit&utmul=en-us&utmje=1&utmfl=10.0%20r45&utmdt=The%20Fun%20of%20Facebook%20Measurement%20%7C%20Gilligan%20on%20Data%20by%20Tim%20Wilson&utmhid=1640286085&utmr=http%3A%2F%2Fgilliganondata.com%2F&utmp=%2Findex.php%2F2010%2F01%2F11%2Fthe-fun-of-facebook-measurement%2F&utmac=UA-2629617-3&utmcc=__utma%3D116252048.1573621408.1267294551.1267294551.1267299933.2%3B%2B__utmz%3D116252048.1267294551.1.1.utmcsr%3D(direct)%7Cutmccn%3D(direct)%7Cutmcmd%3D(none)%3B&gaq=1

You can get a complete list of the Google Analytics tracking variables from Google (if you’re really into this, check out the utmcc value — that actually is a single parameter that includes multiple sub-parameters, which are separated by “%3D” — a URL-encoded semicolon — instead of an “&”; these are the user cookie values, which you can find towards the end of the long string above if you look for it). You can inspect the specific calls using any number of tools. I like to use the Firebug plugin for Firefox, but Fiddler is another free tool, and Charles is the standard tool used at my company. And, there’s always WASP to provide the “clean” view of the parameters (I use WASP heavily…unless I’m trying to reverse-engineer the specific calls being made for some reason).

The Javascript makes a request for that URL. This is the infamous “1×1 image.” Just to sharpen the edges a little bit on some common misconceptions about that image request:

  • The request for the image is what matters — while the 1×1 image will get delivered back, by the time www.google-analytics.com actually sends out the image, the page view has already been counted. As a matter of fact, if there was no __utm.gif image, the traffic would still get counted simply by virtue of the fact that the Google Analytics server received the image request. As it happens, some other little user experience hiccups can happen if there’s no actual image, but the existence of the file matters ‘nary at all from a data capture perspective!
  • Yes, you can actually just request the image directly from your browser. Go ahead — here’s the URL as a hyperlink: http://www.google-analytics.com/__utm.gif (yeah, it’s something of a letdown, but now you can say you’ve done it)
  • The image isn’t a 1×1 pixel image so that it’s small and not noticed by the user. If Google got a wild hair to replace the __utm.gif image with a 520×756 pixel image of a psychedelic interpretation of the Mona Lisa…no one would ever see the change (unless they were doing something silly like calling the image directly from their browser as described in the previous bullet). The image gets requested by the Javascript, but it never gets displayed to the user. It’s sort of like a Javascript dropdown menu — the text for the dropdown gets loaded into the browser memory so that, if you mouse over the menu, the text is already there and can be displayed immediately. The __utm.gif request is the same way…except there’s nothing in the Javascript that ever actually tries to render the image to the user

And one more point: While we’ve been talking about “image requests” here, it doesn’t have to be an image request per se. In the case of Google Analytics, it is. In the case of Webtrends, it is, too (the image is called dcs.gif). In the case of other web analytics packages, it’s not necessarily an image request, but it is a request to the web analytics server. What matters is understanding that there are a bunch of key-value pairs tacked on after a “?” in the request, and that’s where all of the fun information about the visit to the page gets recorded and passed.

4 – Web analytics tool reads the string and puts the information into a database

So, the web analytics server has been getting bombarded with the requests from Step 3. Can you see how straightforward it is for software to take those requests and split them back out into their component parts? That’s the easy part. Where the tools really differentiate themselves is how exactly they store all of that data — the design of their database and then how that data is made available for queries and reports by analysts.

Back in the day (and I assume it’s still an option), Webtrends would make the raw log files available to their customers as an add-on service. That was handy — once we understood the basics of this post and the Webtrends query parameters, we were able to sift through for some juicy nuggets to supplement our “traditional” web analytics (these were in the days before Webtrends had their “warehouse” solution, which would have made the same information available).

5 – Web analyst queries the database for insights

Like step 4, this is an area where web analytics tools really differentiate themselves. In the case of Google Analytics, there is the web-based tool and the API. In the case of paid, enterprise-class tools, there are similar tools plus true data warehouse environments that allow much more granular detail, as well as two-way integration with other systems.

Why Understanding This Matters

You’re still reading, so maybe I should have made this case earlier. But, the reason this matters is because, once you understand these mechanics, you can start to do some fun things to handle unique situations. For instance, what do you do if you have Google Analytics, and you want to track activity somewhere where Javascript won’t run (like…um…your Facebook fan page — that’ll be my next post!). Or, more generally, if you’re Googling around looking for ways to address some sort of one-off tracking need, you’ll understand the explanations that you’re finding — these solutions invariably involve twiddling around within the framework described here.

As I read back through this post before publishing it, I was struck by how far into the tactical mechanics of web analytics it is. The overwhelming majority of web analytics blog posts focus on step 5 and beyond — how to use the data to be an analysis ninja rather than a report monkey. Understanding the mechanics described here is a foundational step that will support all of that analysis work. I was incredibly fortunate, early in my web analytics career, to have an opportunity to run the migration from a log-based web analytics package to a tag-based solution. I was triply fortunate that I worked on that migration with two brilliant and patient IT folk: Ernest Mueller as the web admin (who regularly shares his knowledge these days as a contributor to http://www.webadminblog.com/) supporting the effort, and Ryan Rutan, the developer supporting the effort — he was hacking the Webtrends page tag before the consultant who we had on-site to help implement it had finished his first day. Ernest drew countless whiteboard diagrams to explain to me “how the internet works” (those “immutable laws” I mentioned early in this post), while Ryan repeated himself again and again until I understood this whole “image request with parameters” paradigm.

If you’re a web analyst, seek out these types of people in IT. A hearty collaboration of cross-discipline skills can yield powerful results and be a lot of fun. I had similar collaborations when I worked at Bulldog Solutions, and the last two weeks saw the same thing happening at my current gig at Resource Interactive. Those are pretty energizing experiences that leave me scratching my head as to why so many companies wind up with an adversarial relationship between “the business” and “IT.” But THAT is a topic for a whoooollllle other post that I may never write…

Columbus Web Analytics Wednesday — Feedback Analysis

Tuesday, February 23rd, 2010 by Tim Wilson No Comments

It’s been a crazy month work-wise. As a result, I’ve been chalking up thoughts in my head that I’d love to get written down. One of those thoughts is actually around time management and prioritization, which I’ve been pondering in the context of, among other things, my blogging…but that’s not the subject of this post!

Following our last Web Analytics Wednesday (and somewhat in preparation for our next one), I put out a survey to my list of 150+ past registrants. I used Google Documents for the survey, which is the second or third time I’ve used Google as a survey tool over using the free version of Survey Monkey or some other service. I think I’m hooked. While you don’t have a whole lot of control over how the data gets stored in the underlying Google spreadsheet, the user experience is pretty clean, which I like. And the data can be exported straight from the Google spreadsheet and manipulated in Excel (I know I could theoretically evaluate it within the Google spreadsheet, but I’ve never managed to spend enough time with Google Docs to really have the agility I’d like there).

Who Answered?

I received 21 responses to the survey, which is a 13% response rate. Not bad! The main “profile” question I asked was how many Columbus WAWs the respondent had attended. The results showed a pretty even mix from “little to none” to “some or a lot:”

That’s good, as I feel like the results are pretty representative of the population we’re trying to serve. I didn’t go deep in the survey as to company, industry, role, etc. —  we’ve got a pretty good feel for that, and I knew I wasn’t going to go nuts with trying to segment the results as part of the analysis.

What Attendees Are Looking to Get Out of WAWs

So, what are attendees looking to get out of Columbus WAWs? This question had 3 options for each category: “Not really,” “Sort of,” and “Very Much So” (my friends at Foresee Results would probably tell me that a 10-point scale without intermediate labels would have been a much cleaner method…but I’ve never claimed to be an expert in this sort of thing). The chart below shows the “Very Much So” and the “Sort of” responses (I segmented by the number of times people had attended and did not glean anything of note there):

Networking came out of the top of the list, although it was a virtual tie with increasing web analytics knowledge. That’s great, as these are two of the core goals for WAWs. We need to keep doing a “pure networking” event here and there. A challenge with those events is with the sponsorship — if we get a sponsor other than the Web Analytics Wednesday Global Sponsors, we really need to give them a forum to talk. All of our past sponsors have been great about not making their presentations “sales pitches” — we get good, practical content from them. But, they get to solidify their positions as experts in an area.

Dave Culbertson and I have discussed several times that WAWs seem to draw SEO/SEM-interested people as much as web analytics-interested people. The disciplines have a heavy overlap, so that’s not a surprise. The survey results back this up, so we will continue to incorporate search-oriented topics.

I was a bit surprised by the low number of people who indicated “find a job,” as it seems like I talk to one or two people each month who are between opportunities. That may be the result of an imperfectly sampled population.

And, it’s good to know that there’s a healthy interest in drinking good beer (although that puts a tough constraint on the venue selection, which I’ll touch on later).

WAW Scheduling

On a highly practical front, we occasionally get feedback that Wednesday evenings are a bad time for a person — we’ve had some past regulars who simply haven’t been able to attend due to commitments elsewhere on Wednesday evenings (pool league, hockey league, teaching CCD, etc.). When we first started WAWs in Columbus, we held them on Tuesdays for this very reason, but a survey last year showed a shift to Wednesdays would work better.

In this survey, Wednesday dinners did come out at the top of the pack:

Now, there very well may be survey bias in the responses to these questions, because we’ve been consistently holding WAWs on Wednesdays over happy hour/dinner, which means most of the people invited to participate in the survey had registered for an event at that time. But, that’s the only group I have easy access to in order to survey, so I’m running with it.

As one more check (and somewhat just for the data visualization challenge of it), I cross-tabbed these two questions and put it on a bubble chart:

Again, Wednesday dinners are the clear winner. What’s a little troubling is that only 2/3 of the respondents indicated this combination was good for them. We’ll have to grapple with that a bit — you can’t please all the people all the time, certainly, but we also don’t want to shut out people all the time due to structural conflicts. My take is that we can definitely steer clear of Fridays and we should stick with “after work” time slots. But, we may try mixing up the days of the week a bit.

Data visualization side note: the way I represented the data above works okay, I think, but it also is a good exercise in showing one of the reasons that pie charts are evil. The number inside each circle shows how many respondents had answers that fell in both categories. Compare the size of a “1″ to the size of the “14″ — does it look to you like the larger circle is fourteen times as big as the smaller one? It doesn’t to me. In this case, the bubbles have the values labeled inside of them, partly because the pure visualization seemed misleading. Human beings are notoriously bad at interpreting 2-dimensional areas.

Communication

We’ve got a wide range of ways we promote WAWs, so I wanted to get a sense as to which ones people preferred.

How do you prefer to stay informed of upcoming WAWs?The only surprise here was that “Running into Dave Culbertson” was at the bottom of the list! Of course, I didn’t ask Dave to mention to people he ran into that this survey was posted, so there’s that pesky survey bias again. We’ll keep up the e-mails (in almost two years of building up the Columbus WAW database, we’ve had a total of 2 opt outs, so I’ll keep the frequency of communication about the same, as it seems to be working).

Open-Ended Feedback

A number of respondents took the time to provide detailed thoughts on the event overall, and, specifically, on the request for other venue suggestions.

I have an infatuation with Wordle at the moment (specifically when it comes to certain types of online listening), which is one of those subjects for a future posts. Below is a wordle of the general feedback responses — I can’t help but smile when I look at it:

A summary of some of the specifics in the general feedback:

  • There were several suggestions that we occasionally have practitioners rather than vendors present: case studies, best practices, or even peer problem-solving sessions
  • There was a suggestion to try a round table or un-conference format around the state of SEM/SEO/analtyics
  • One respondent suggested a competition of sorts — having attendees bring their “best stuff” on a topic or a challenge; maybe even trying to have a prize of some sort to the “winner”
  • “If you could get Avinash Kaushik to speak that would be SWEET!”
  • One person noted that our topics tend to very consumer brand-oriented (which is true), and that it would be nice to have content that is more general and that could be applied to B2B
  • There was a pretty healthy level of general gushing about the quality and value of the event

One person noted in the  general feedback that “I like Barleys as a venue- that room has nice square dimensions that keep the energy together- you can’t really get stuck off to the side.” We knew Barley’s was good, but this was a fresh perspective on one of the reasons as to “why.” On the “alternative venue suggestions” question, several people commented that any location needed to be central (as Barley’s is). The only specific alternative venue suggested was Spaghetti Warehouse, which we’ve used in the past. It is centrally located, and it has a pretty good meeting spot (we’ve been seated upstairs), and it’s quiet enough to have conversations. The two downsides are: 1) the area where it’s located (although they do have a security guard posted in the parking lot at all times), and 2) the beverage selection. One of the responses to the venue question was: “Anywhere with ‘Good Beer’!” Spaghetti Warehouse definitely falls short on that front, but at least it’s not entirely dry!

We may give that another shot.

Feedback Is ALWAYS Welcome

If you didn’t participate in the survey (or if you did but have other comments), please leave a comment or drop me an e-mail (“tim” at this site’s domain).

A Record-Setting Web Analytics Wednesday in Columbus with CRM Metrix

Tuesday, February 2nd, 2010 by Tim Wilson 1 Comment

Last week’s Columbus set a new record for the meetup — we had exactly FIFTY attendees, which was a great showing. Part of the large draw was undoubtedly the event sponsor, CRM Metrix (@crm_metrix on Twitter).

Pre-Meal Networking (and a Friendly Wave from Jonghee!)
Columbus Web Analytics Wednesday -- Jan 2010

Hemen Patel, CRM Metrix CTO, facilitated a lively discussion about incorporating the voice of the customer in web site measurement and optimization.

Hemen Patel Presents
Columbus Web Analytics Wednesday -- Jan 2010

Hemen walked through a brief deck (below) that sparked some great back-and-forth with the crowd.

A Rapt Audience
Columbus Web Analytics Wednesday -- Jan 2010

Monish Datta Asks a Question
Columbus Web Analytics Wednesday -- Jan 2010

With a crowd of fifty people, not only did I not get to meet the first-time attendees, but I barely had a chance to say, “Hi” to some of the long-time regulars. I guess we’ll just have to have another one in February (I’m working on it!) so I’ll get that chance!

Facebook Measurement: Impressions from Status Updates

Wednesday, January 27th, 2010 by Tim Wilson 4 Comments

In my last post, one of the challenges I described was that it was impossible to get a good read on the number of impressions a brand was garnering from their fan page status updates — a status update on a fan page appears in the live feeds of the page’s fan…assuming the fan hasn’t hidden updates from that page and the fan logs in to Facebook and views his/her live feed before there are so many new updates from his/her other friends that the status update has slid down into oblivion.

A lot has changed since that post! A few days after that post, Nick O’Neill reported that a Facebook staffer had let the cat out of the bag during a presentation in Poland and announced that impression measurement was on the way. And, last Thursday, it became official. IF you have an authenticated Facebook page (at least 10,000 fans and you’ve authenticated the page when prompted), you now get (with some delay), something like this underneath each of your status updates:

Pretty slick, huh?

First, Impressions

I’ll be the first to say that “impressions” is a pretty loose measure — it’s a standard in online advertising, and it became the go-to measure there because print and TV have historically been so eyeball-oriented. It’s not a great measure, but it does have some merit. I’ll even go so far as to claim that a Facebook impression is “heavier” than a typical online display ad (be it on Facebook or some other site), because many online display ads are positioned somewhere on the periphery of the page where we’ve trained ourselves to tune them out. A Facebook impression is in the fan’s feed.

Of course, the other way to look at it is that it’s only showing up for people who are already fans of your page, which, presumably, are people who already have an affinity for your brand (although, considering that “fan” is short for “fanatic”…methinks the meaning of the term has evolved to be a much lesser state of enthusiasm than it was 20 or 30 years ago). So, it’s not much of a “brand awareness”-driving impression.

Facebook’s note on the subject gives a pretty clear definition of how impressions are counted:

…the number of impressions measures the number of times the post has been rendered on user’s [sic] browsers. These impressions can come from a user’s news feed, live feed, directly from the Page, or through the Fan Box widget. This includes instances of the post showing up below the fold.

Clear enough. This will be really useful information for sifting through past status updates and seeing which ones garner the highest impressions per fan to determine what day (and time of day) is optimal for getting the broadest reach for the update (remember that impressions have nothing to do with the quality of the content — it’s just a measure of how many people had that post rendered in their browser). Juicy stuff. The impression count will (or should…Facebook metrics have a record of being a little shifty) only go up over time. So, to get a good handle on total impressions, you’ll have to let the update be out there for a few days or a week before it really closes in on its top end.

% Feedback

So, what about that “% Feedback” measure? This is a good one, too — it’s actually a tighter measure of “post quality” than the Post Quality measure provided through Facebook Insight (Post Quality is vaguely defined by Facebook as being “calculated with an algorithm that takes into account your number of posts, total fan interactions received, number of fans, as well as other factors.” Yeesh!). It’s simple math:

(Likes + Comments) / Impressions

What percent of people not only had the status update presented to them, but also reacted to it strongly enough to take an action in response to the post? In the screen cap above: (11 likes + 9 comments) / 31,895 impressions = 0.06% Feedback. Is that good or bad? It’s too early to tell (the same page that I pulled the above from had another status update with a 1.62% Feedback value), but I like the measure as a general idea. And, it’s easy to understand and recreate, so all the better. It is a measure of the quality of the content (although, in theory, a status update could go out that really upset a lot of people, which could drive a high % Feedback score by attracting a lot of negative comments).

I’m a little bothered by combining Likes and Comments. To me, a Like is a much lower-weighted interaction than a Comment — a like is a “I read it and agree enough to click a link while I move along” reaction, whereas a comment is a “I read it and had a sufficiently strong reaction to form a set of words and take the time to type them in” reaction. But, for the sake of simplicity, I’m good with combining them. And, the calculation is so simple that it would be easy enough to manually calculate a different measure.

As far as I can tell (so far), Facebook isn’t providing a way to get “overall impressions and % Feedback” measures by day through Facebook Insights. The data is available on a “by update, manually gathered” basis only. But, I don’t want to be difficult — I love the progress!

The Fun of Facebook Measurement

Monday, January 11th, 2010 by Tim Wilson 11 Comments

If you are a marketer, Facebook is important — the number of active users of the site exceeds the population of the United States, and it’s growth is going to do nothing but increase. Check out the Facebook statistics page for a slew of numbers that are all…big. Because of the growth of Facebook as a critical marketing channel, a hot topic around the office right now is “Facebook measurement.” It’s a tricky topic that reminds me of the early days of web analytics: there’s some basic stuff that’s easy to measure, and it’s basically helpful, but there’s a lot more that can’t be measured or can’t be measured well, and that’s where the real value is.

There are (at least) five different aspects of Facebook that can be measured:

  • Facebook display ads — I’m not going to cover that at all; I haven’t spent a whole lot of time digging into it with our clients, so I’m not going to write about it. Check out What about advertising on Facebook? over on the Site Pro Specialties site for a quick overview of the ins-and-outs and their experience with Facebook display ads
  • Facebook applications — I’m not going to cover this, either, largely because my experience on the subject is pretty limited, but also because Facebook apps annoy the bejeezus out of me, and I don’t want to have to stifle a gag reflex by writing about measuring them
  • Facebook groups — Holy cow! Another topic I’m not going to cover! Since Facebook pages came on the scene, that’s where brands tend to be living more, and Facebook provides more measurement support for pages, so that’s where I’m going to focus the most
  • Facebook pages — I’ll focus on these quite a bit, as this is an area that brands are really starting to settle into as a formal presence on Facebook
  • General Facebook activity – this is an area where measurement is highly limited, but I’ll lay out what is there and what I hope comes sooner rather than later

We’re in a bit of an ugly period from an analyst’s perspective, in that Facebook hasn’t made supporting marketers a high priority beyond paid advertising. And, the company is being very cautious on the privacy front (which, from a consumer’s perspective, is a good thing!). The easiest way to reduce the risk of a PR blowup from misuse of Facebook data is to limit the availability of that data to marketers. I can’t blame them, but it doesn’t mean they’ve made my life easy on that front.

Ready? Let’s go! Here’s a quick set of links to use to jump down to specific topics:

Facebook Pages — Fan Count

Facebook pages are a way for brands to establish a formal, managed presence on the site. They’re easy to set up, and they can range from the very simple and unused (see the Smuirfield Golf Club page) to the very elaborate and active (see the Victoria’s Secret PINK page). For any page, regardless of whether you are an admin for it or not, you can see the total number of fans at a point in time — the example below is from the Slate Political Gabfest page:

That can be useful for a couple of reasons:

  • Organically grown pages — it’s fairly common for major brands to have their fans set up pages and grow a decent following; being able to tell the reach of those pages can help identify when outreach or integration might be in order
  • Competitive research — it can be tedious, but assessing the size and growth of competitor fan pages over time can provide insight (albeit limited insight) into their overall social media strategy and their ability to execute

There is no way to measure the change in any page’s fan count over time other than periodically going and checking and recording it. And, what does total fans tell you? It tells you something…but not as much as you might like. More on that later.

Facebook Pages — Facebooks Insights Data

Now, if you have admin access to a Facebook page, you can get much richer data, and you can get a historical view of some of that data. On the page itself, above the Fans box, is the basic Facebook Insights box:

While this looks encouraging, it’s not particularly useful. “Post Quality” sounds like a good idea (pick any measure of activity volume, and you can say, “It’s not just about quantity — it’s about quality!” and sound smart), exactly how Facebook determines quality is a bit of a mystery. From the Facebook Help Center:

The Post Quality score measures how engaging your Posts have been to Facebook users over a rolling seven-day window.

Post Quality is an important indicator for how fans gauge your posts. This score is calculated with an algorithm that takes into account your number of posts, total fan interactions received, number of fans, as well as other factors.

It’s a measure that’s almost too vague to be useful. And, in practice, the historical trending of Post Quality shows that something about the way it is measured makes it pretty non-actionable — even for pages that have a high level of fan engagement consistently, a trendline of Post Quality goes all over the place.

So, now we dive into the real meat of Facebook Insights, which initially looks like a nice, juicy T-bone, but which turns out to be more like a pretty lean cut of venison. The See All link in the Insights box brings up the main Facebook Insights page (click on the image below to view a larger version):

This page has the second not-nearly-as-useful-as-you’d-like measure: Active Fans. Facebook is even more fuzzy about how this is calculated than it is about Post Quality. And, historical data is not available. In my experience, Active Fans is a pretty big crap shoot — it varies widely from day to day and, since it’s not easy to get historical data, it’s a mess to try to analyze what is going on and how it is really changing over time with any granularity. Conceptually, active fans are high-quality fans. In my experience, the number of active fans in any given period is a tiny fraction of the overall fans. So, the million-dollar question — “What is the value of a Facebook fan?” — should probably include a separate calculation for an “active fan.” But, “active fan” is such a messy measure with such limited availability, that it’s barely worth pursuing until it’s more accessible and explainable.

Most of the other measures, though, have historical data available via the graphs shown on the page. Some underlying data can be exported as a CSV or Excel file with granularity at the individual day level. Two wrinkles with that data, though:

  • The timing of the data updates is inconsistent, and it doesn’t seem like “if data is there, it’s good data” — a note in the bottom of the Insights window states: “Please allow 48 hours for data to be available for a daily report;” it’s common to see some data for a given day populated while other data for the same day isn’t; while I don’t feel like “real-time” data is generally warranted, the 48-hour lag can put a real crimp in effectively weekly reports, as well as in getting a good, timely view into the results of a new Facebook campaign
  • The data doesn’t appear to be kept forever; it used to seem like data dropped off once it was ~3 months old, but the actual range of available data seems to vary, and Facebook doesn’t provide information on the subject; we’re in the practice of exporting all available data monthly so that we’ve got it retained offline for our clients

The main export option is the Fans and Interactions export. The other two exports that are available are Demographics and Country. The demographics export simply shows, by day, the number of fans of a given age range/gender. The demographics of active fans over time is not available, unfortunately. The Country export simply shows the number of fans from each country over time.

Now, Fans and Interactions is where the most useful information is. You can get a great look into how fan growth has been growing over time — new fans, total fans, unsubscribes, etc. This provides a way to do a classic “leaky bucket” report — how many fans you are  losing compared to how many new fans you are acquiring. Unsubscribes are interesting, because that means fans have explicitly removed themselves as fans rather than simply choosing to remove the page’s updates from their feeds. Which…alludes to the Big Wrinkle when it comes to fans — just because someone is a fan of your page doesn’t mean they’re seeing anything that happens on the page — it’s very easy for users to hide all updates from a page from their feeds. And Facebook doesn’t provide data as to how many people have done that!

Fans and Interactions also provides data on the number of “interactions” which is the sum of all of the likes, posts, and comments that occur each day. In my mind, a “like” is a pretty light interaction, while a post or a comment is a more significant interaction, because a fan actually had to put together words to express an idea. Facebook Insights provides details for each type of interaction, too, though, so you can measure the different types of interactions. This export provides four types of interactions: Likes, Comments, Wall Posts, and Discussion Posts.  It can get a little confusing as to which type of user activity occurs where, so be prepared to click back and forth between your page and the data for a while to get the hang of it (I’d write it out here, but this post is already getting pretty long and unwieldy!). The data also includes “Posts” — these are your posts rather than fan posts.

Finally, Fans and Interactions provides basic web analytics data. VERY basic. Page views, unique page views, audio plays, video plays, and photo views. At a very high level, this is useful information, as it’s a measure of whether the page is sufficiently engaging to drive people to visit (note that someone by no means has to be a fan to visit the page, view content, and comment on it — if a page has a lot of page views but a small number of fans, then it may be an indication that users would like to engage with the brand in Facebook, but the actual content/activity occurring on the page is not strong enough to get them to become a fan once they actually visit). Data that is not provided includes: which tabs of the fan page were visited, which videos were played (and how much of the video was viewed), and which photos were viewed. Supposedly, this sort of capability is in the works at Facebook, but no one I’ve talked to is committing to any dates for them to roll out.

Facebook Insights also doesn’t provide data on:

  • Suggest to Friends usage
  • Subscribe via SMS usage
  • Add to My Favorites usage
  • The ability to export wall posts, discussion posts, and comments (more on this in the last section of this post)
  • Page visit frequency

The lack of Suggest to Friends data is particularly painful — this would be a powerful measure of how engaging the content on the page is, and there is zip when it comes to any visibility into that.

I expect that Facebook Insights will evolve over time to provide more content-level detail, as well as usage of other “page” features. It’s less likely that Insights will evolve to include user-level detail due to privacy concerns, although it’s not inconceivable — this would be the equivalent of having access to detailed behavioral data for users who have registered with your web site and are making subsequent visits.

Facebook Pages — Web Analytics Measurement, Part I (The Ugly Part)

Depending on how you squint when you look at it, a Facebook fan page for your brand is just an off-site extension of your web site — just like any content you host on a third-party site (job postings that are hosted by a recruiting site, events that get managed through a third-party event management site, etc.). For third-party sites whose bread and butter is extending the content offerings from web sites, it’s common to deploy the main site’s web analytics page tag on the third-party content pages. There are myriad ways to set up the reporting for that in any web analytics tool — Google Analytics, SiteCatalyst, Webtrends, Coremetrics, etc. In theory, Facebook pages would be the same way — just as you can embed all sorts of rich content on custom tabs, it seems like you would be able to insert your web analytics page tag on the pages where you have heavy control over content.

But, Facebook currently has an industrial-sized monkey wrench inserted into that approach by not allowing Javascript to execute on its pages. Presumably, this gets back to privacy — concern that opening up the site to allow scripts to execute would open up the potential for some page admins to figure out a way to capture too much personal information from visitors/fans of their pages.

So, what options are there? There are several, but they’re all clunky.

[UPDATE: The next little section is continuing to evolve, as I've been doing a lot of digging and experimentation in this area, finding both new roadblocks as well as trying out workarounds]
Generally speaking:

  • Use an iFrame for the content and put your usual page tag in it — the wrinkle here is that you can’t put an iFrame on a custom tab; it has to be a standalone application canvas. Now, you can include within the frame a dummied-up re-rendering of the tabs on your fan page, but that’s really not ideal. There is a mildly helpful thread on the Google Analytis forum on the subject, as well as a thread on the Facebook developers forum with some useful tips
  • Either use the <noscript&gt capability in your web analytics package (if one exists) or hack the actual image call that triggers a page view/action in your web analytics package — this is pretty cumbersome to do, and it has its limitations, as it’s essentially going back to the early days of page beacon/page dot technology for web analytics; it’s better than what you get out of Facebook Insights, though
  • Build a custom solution that makes an image (or some other asset) call to a reporting server you manage — you would need a unique call for each activity you want to track — and then sift through the server log file to construct what’s happened; you’re going to run into challenges with caching of images, though, so this will be incomplete data at best

All of these only work on pages where you have a decent level of control over the content, which leaves out the Info, Photos, Videos, and Discussion tabs…and it’s a little dicey as to what’s doable on the Wall. But, presumably, it’s the custom tabs where you’re investing the most resources to develop content, so that’s a pretty good place to get some more granular web analytics data.

We’ve actually managed to get some tracking of interactions occurring on a user’s wall within a Flash-based status update using Google Analytics (using the third approach above), and we’re close to rolling out some pages that will use the second item above with Webtrends (which will track both interactions within a Flash app as well, we expect, as traffic to individual custom tabs).

[End of section that is still evolving]

In short, though, this is pretty messy.

Facebook Pages — Web Analytics Measurement, Part II (The Pretty…but Short…Part)

If you link back to your main site from your Facebook page (which, presumably, you do in multiple places), then standard parameter-based campaign tracking works. Use it. ‘nuf said.

General Facebook Activity — Web Analytics Measurement

In addition to tracking links that you control on Facebook with campaign tracking (the previous section), you can and should look at Facebook as a broader source of traffic to your site. If you are posting content on your site that is share-worthy, then Facebook users can pick it up and share it through Facebook, which will drive referrals to your site. If you’ve actually enabled content-sharing capabilities on your site, and those capabilities include Facebook, then you can add campaign tracking parameters to content as it gets shared, which will give you better visibility into what specific content is most compelling and passed along. Beyond just the traffic to the site, the bounce rate and conversions from that traffic are useful — is the sharing of your content bringing visitors to your site who are finding value and doing valuable things?

The caution here is to not get overly obsessed with Facebook as a source of traffic to your site. It certainly can (and probably should) be a source of traffic, but your site isn’t necessarily the best destination point for all of your customers. Just because this is easy data to get to doesn’t mean that it is the best data to use to measure the performance of your site.

General Facebook Activity — LOTS is Missing

Overall, Facebook measurement — measurement of what really matters — is still very immature. We’re largely stuck with measuring basic counts of things that are easy to measure: total fans, unique pageviews, etc. But, when it comes to both measuring the impact of a Facebook investment as well as being able to analyze what is and is not working, we’re missing a lot:

  • Impressions – how many people are actually being presented with content related to your brand? Besides Facebook display ads, this is total guesswork; just because a page posts a status update doesn’t mean it ever shows up on the screen of a fan (the update may slip well into the “More” area before the fan logs on again, the fan may have those updates hidden); “impressions” is far from being an end-all/be-all measure, but it’s a pretty good indicator of reach, and it’s really not available in the Facebook world
    [UPDATE: Since I originally wrote this post, I've found out that Facebook has something in the works for this -- the one referenceable source is Facebook Presentation Reveals "Post Analytics" And Real-Time Ad Targeting. It's a total crapshoot as to when this functionality will be available and to whom it will be available.][UPDATE No 2: This capability was formally rolled out on January 21, 2010. I posted my take on what that provides.]
  • Social Graph and Impact — all Facebook users (and, thus, all Facebook page fans) are not equal; all of the major online listening platforms attempt to measure the influence of the “speaker,” and, conceptually, this construct applies in the Facebook world, driven by various aspects of the user: how many friends they have, how often they update their status, and, most importantly, how often the content they share gets liked/commented on/re-shared; it is currently not possible to get any visibility into and segment users who are interacting with your brand on Facebook based on their influence in the medium
  • Sentiment – Facebook has the “Like” feature, but no comparable “Dislike” option; this is grade school manners enforcement: “If you can’t give it a thumbs-up, don’t give it any thumb at all…” From a brand perspective, though, it would be nice to be able to track what sorts of posts raise users’ ire (especially for user-generated content) without having to sift through individual posts and comments by hand, which leads me to…
  • Sentiment…continued — sentiment is a tough nut to crack, but it’s something that everyone who deals with social media recognizes as being important; while I don’t necessarily expect Facebook to develop sentiment measurement tools inherently, if Facebook Insights was enhanced to enable the export of all user interactions for a fan page, then third-party tools could be used to conduct a sentiment analysis, and that would be useful
    [UPDATE: While it's not necessarily a business/analyst-friendly option, the Facebook API does allow the retrieval of comments and posts. If you have the chops to tackle it, you can read about the options at http://wiki.developers.facebook.com/index.php/API#Data_Retrieval_Methods. One company that is using the API for that purpose (among others) is Vitrue -- comments and posts get pulled into their Vitrue SRM product in a pretty slick way.]
  • Online Listening…to Facebook — Google announced late last year that they were going to start crawling publicly available content in Facebook, and, presumably, online listening platforms will not be far behind (maybe some of them already do?). But, this listening is inherently limited to public content in Facebook (fan pages are public, so they would be included, presumably, which is a good thing). There would be a major backlash if Facebook enabled third-party tools to crawl and index “private” content. Does that mean that Facebook should enable it’s own intra-Facebook online listening capability? Marketers would certainly love to have the information, even if it is only available in a way that maintains users’ anonymity, but any move in this direction would be a dicey proposition for Facebook (even if they hid user information, it would be conceivable that users would provide enough information in what they post that a company would still be able to identify a specific individual — even if that was only going to be possible 1 time in 100,000, privacy advocates would jump all over Facebook for allowing the theoretical possibility)

It will be interesting to see where Facebook goes over the next 1-2 years when it comes to empowering marketers to measure and analyze their Facebook-based tactics. It should be a fun ride.

What am I missing here?

Four Books That Will Change the Way You Communicate

Tuesday, December 22nd, 2009 by Tim Wilson 1 Comment

I don’t think I will ever forget the first time that I made a presentation at work. It was just over a decade ago, I was just a few months into my employment at a company where I would work for the next eight years, and I was on the hook to present a new process to a room of 20 engineers. I diligently prepared my transparencies (I’m old enough to have used an overhead projector, but not old enough to refer to the medium they supported as “foils”). I rehearsed the material again and again.

And I bombed.

The material was dry as it was, but it wasn’t, by any means, unmanageable content. I just didn’t do a good job of managing it!

Fast forward 10 years, and I found myself giving a presentation to a room of 50-60 people, and the material was set up to be just as naturally engaging — presenting on an approach to measurement and analytics to…a bunch of marketers.

The presentation went much better, judging both from the engagement level of the audience and discussions that it has prompted weeks later. I’m no Steve Jobs, but I’ve paid attention to what seems to work and what doesn’t (both in my presentations and others), read some articles here and there, and, I realized, read a few books along the way that have really helped.

So, with that — four books that all have a heavy component of “how the brain works” and that, collectively, have taught me a lot about how to present information, be it a dashboard, a report, or a presentation.

Gladwell and Gilbert

The first two books are books that I read within a few months of each other. To this day, I recall specific anecdotes with no idea which book they came from. Blink: The Power of Thinking Without Thinking made the rounds when it first came out as “another great book by Malcolm Gladwell” (following The Tipping Point: How Little Things Can Make a Big Difference). The fundamental anecdote of Blink has to do with our “adaptive unconscious” — our intuition and ability to “know” things without fully needing to process them. As he dives into example after example, Gladwell touched on various aspects of how the brain works.

Daniel Gilbert’s Stumbling on Happiness takes a more directly psychological angle, but it covers some of the same territory. One of Gilbert’s main points is that the human brain does not remember things like we think it does — pointing out that a vividly remembered, down-to-the-color-of-the-shirt-you-were-wearing memory is not really an as-recorded memory at all. Rather, the brain remembers a few specific details and then makes up / fills in the rest when the memory gets called up. It’s so good at filling in these blanks that it fools itself into not being able to tell fact from interpolation!

Both of these books made an impact on me, because they pointed out that how we take in, process, and store information doesn’t work at all like we intuitively think it does. And, both books set up the next two books by shaking the assumptional foundations I had of how we, as humans, think.

Straight-Up Business Reading

Chip and Dan Heath’s Made to Stick: Why Some Ideas Survive and Others Die is a practical manual for communicating information that you want your audience to pay attention to and retain. They boil the components into a five-letter acronym — S.U.C.C.E.S. — and go into each component in detail.

The elements are Simple, Unexpected, Concrete, Credible, Emotional, and Stories, and they provide a nice framework for critiquing how we communicate any idea. Irecognize that I regularly struggle with Simple, Concrete, and Stories as elements in my blog posts. But, every element is one that can be injected using some discipline and time to do so. I nailed all three of these elements a number of years ago when I found myself on an internal lecture circuit trying to drum up large donors for my company’s annual United Way campaign — I was heavily vested in conveying a strong message, and I wound up using an example of my grandfather’s battle with Alzheimer’s as a way to pull the audience in and ask them to find something they were passionate about and support it. I also wove in various quirky takes on how $10/week would really add up — think the sort of thing you hear again and again from your local NPR station during fundraising drives. In the case of that campaign, we blew our numbers out of the water — had a 500% increase in the number of people who gave at the “leadership level” that year. Now, a lot of things had to come together to make that happen, but, to this day, I’m sure my well-crafted, well-rehearsed, and sincere speech made to at least a dozen different groups of employees (and the fact that I was a fairly low-level employee making the case — I was asking people who were making a lot more money than I was to give at least as much as I was), played a non-trivial role.

And that was years before I read Made to Stick. But, the book helped me reflect on any number of presentations — ones that worked and ones that didn’t.

And, Finally, Wisdom from a Neuroscientist

The last book in this tetralogy is one that I just finished reading — Brain Rules: 12 Principles for Surviving and Thriving at Work, Home, and School, by John Medina. I stumbled across the book as a recommendation from Garr Reynolds of Presentation Zen, so I wasn’t surprised that it had some very practical tips, as well as the “why?” behind them, for communicating effectively. Medina’s premise is that there’s a ton of stuff we don’t yet understand about the brain. BUT, there are also a lot of things we do know about the brain, and many of those lay out pretty clearly that the way we work in business and the way our education system is set up both run counter to how the brain naturally functions.

These “things we do know” are broken down into 12 “rules” — exercise (good for the brain), survival (why and how the brain evolved…and implications), wiring (how the brain works at a highly micro level), attention (there’s NO SUCH THING as multitasking…and other goodies), short-term memory (what makes it there and how), long-term memory (what makes it there, how, and how long it takes to get there), sleep (good for the brain), stress (some kinds are good, some kinds are bad), sensory integration (the more senses involved, the better the memory), vision (the #1 sense), gender (men are from Mars…), exploration (age doesn’t really degrade our ability to learn). Medina ends each chapter (one rule per chapter) with “Ideas” — implications for the real world based on the information presented.

The book goes into very technical detail about how, when, and where electrical charges zip around in our skulls to accomplish different tasks. While that information is not directly applicable, each time he goes there it’s as a setup to more directly useful information. Throughout the book, Medina provides practical thoughts for how to communicate more effectively — helping people pay attention (getting the information you are communicating into working memory) and retain the information over both the short and the long term. Two of my absolute favorite nuggets from the book were:

  • p. 130 (in the chapter on long-term memory) — Medina has the reader do a little memory exercise with the following characters: “3 $ 8 ? A % 9.” The fact he drops after the exercise is interesting: “The human brain can hold about seven pieces of information for less than 30 seconds! If something does not happen in that short stretch of time, the information becomes lost.” This is about getting information on its way from working memory to long-term memory and how repetition, thinking about the information, and talking about the information all helps it on its way. As a communicator (be it through a presentation or through a dashboard of data), this seems like powerful stuff — how often have we all seen someone cut loose with slide after slide of mind-numbing information? The human brain simply cannot take all of that in and retain it without some help!
  • p. 239 (in the chapter on vision) — Medina has a section titled “Toss your PowerPoint presentations.” I groaned. While I get highly annoyed by the rampant misuse of PowerPoint, I’m not a Tufte acolyte to the point that I see the tool itself as evil. In the second paragraph, though, Medina clarifies by providing a two-step prescription: 1) burn your current presentations, and 2) make new ones. Medina’s beef with PowerPoint is that the default slide template is text-based with a six-level hierarchy. This entire chapter is about how a picture really is worth 1,000 words, and Medina pleads with the reader to cut wayyy back on the text in his/her presentations (he has a fascinating explanation of how, when we read, we’re really interpreting each letter as a small picture…and that’s actually not a good thing for retention of information).

There are oodles of other good information in the book, but these are two of the snippets that really resonated with me.

Better to Be Steve Jobs than Bill Gates

I do believe that some people have better communication instincts than others. I’ll never be Steve Jobs when it comes to holding an auditorium in the palm of my hand. But, between reading these books and thinking through my own evolution as a communicator (this blog notwithstanding…but I’ve always said that I write this blog to keep my e-mails shorter and to try out ideas that occur to me during the day — sorry folks…both of you…but this blog is mostly for me!), I’m convinced that effective communication is a trainable skill.

I’ve also noticed that, the more I have to communicate, and the more I work to do so effectively, the easier it seems to be getting. In another 20 years, I might just have it nailed!

The Spectrum of Data Sources for Marketers Is Wide (and Overwhelming)

Monday, December 14th, 2009 by Tim Wilson 1 Comment

I’ve been using an anecdote of late that Malcolm Gladwell supposedly related at a SAS user conference earlier this year: over the last 30 years, the challenge we face when it comes to using data to drive actions has fundamentally shifted from a challenge of “getting the right data” to “looking at an overwhelming array of data in the right way.” To illustrate, he compared Watergate to Enron — in the former case, the challenge for Woodward and Bernstein was uncovering a relatively small bit of information that, once revealed, led to immediate insight and swift action. In the latter case, the data to show that Enron had built a house of cards was publicly available, but there was so much data that actually figuring out how to extract the underlying chicanery without knowing exactly where to look for it was next to impossible.

With that in mind, I started thinking about all of the sources of data that marketers now have available to them to drive their decisions. The challenge is that almost all of the data sources out there are good tools — while they all claim competitive advantage and differentiation from other options…I believe in the free markets to the extent that truly bad tools don’t survive (do a Google search for “SPSS Netgenesis” and the first link returned is a 404 page — the prosecution rests!). To avoid getting caught up in the shiny baubles of any given tool, it seems worth organizing the range of available data some way — put every source into a discrete bucket.  It turns out that that’s a pretty tricky thing to do, but one approach would be to put each data source available to us somewhere on a broad spectrum. At one end of the spectrum is data from secondary research — data that someone else has gone out and gathered about an industry, a set of consumers, a trend, or something else. At the other end of the spectrum is the data we collect on our customers in the course of conducting some sort of transaction with them — when someone buys a widget from our web site, we know their name, how they paid, what they bought, and when they bought it!

For poops and giggles, why not try to fill in that spectrum? Starting from the secondary research end, here we go…!

Secondary Research (and Journalism…even Journalism 2.0)

This category has an unlistable number of examples. From analyst firms like Forrester Research and Gartner Group, to trade associations like the AMA or The ARF, to straight-up journalists and trade publications, and even to bloggers. Specialty news aggregators like alltop.com fall into this category as well (even if, technically, they would fit better into a “tertiary research” category, I’m going to just leave them here!).

I stumbled across iconoculture last week as one interesting company that falls in this category…although things immediately start to get a little messy, because they’ve got some level of primary research as well as some tracking/listening aspects of their offer.

Listening/Collecting

Moving along our spectrum of data sources, we get to an area that is positively exploding. These are tools that are almost always built on top of a robust database, because what they do is try to gather and organize what people — consumers — are doing/saying online. As a data source, these are still inherently “secondary” — they’re “what’s happening” and “what’s out there.” But, as our world becomes increasingly digital, this is a powerful source of information.

One group of tools here are sites like compete.com, Alexa, and even Google’s various “insights” tools: Google Trends, Google Trends for Websites, and Google Insights for Search. These tools tend to not be so much consumer-focussed as site-focussed, but they’re getting their data by collecting what consumers are doing. And they are darn handy.

“Online listening platforms” are a newer beast, and there seems to be a new player in the space every day. The Forrester Wave report by Suresh Vittal in Q1 2009 seems like it is at least five years old. An incomplete list of companies/tools offering such platforms includes (in no particular order…except Nielsen is first because they’re the source of the registration-free PDF of the Forrester Wave report I just mentioned):

And the list goes on and on and on… (see Marshall Sponder’s post: 26 Tools for Social Media Monitoring). Each of these tools differentiates itself from their competition in some way, but none of them have truly emerged as a  sustained frontrunner.

Web Analytics

I put web analytics next on the spectrum, but recognize that these tools have an internal spectrum all their own. From the “listening/collecting” side of the spectrum, web analytics tools simply “watch” activity on your web site — how many people went where and what they did when they got there. Moving towards the “1:1 transactions” end of the spectrum, web analytics tools collect data on specifically identifiable visitors to your site and provide that user-level specificity for analysis and action.

Google Analytics pretty much resides at the “watching” end of this list, as does Yahoo! Web Analytics (formerly IndexTools). But, then again, they’re free, and there’s a lot of power in effectively watching activity on your site, so that’s not a knock against them. The other major players — Omniture Sitecatalyst, Webtrends, Coremetrics, and the like — have more robust capabilities and can cover the full range of this mini-spectrum. They all are becoming increasingly open and more able to be integrated with other systems, be that with back-end CRM or marketing automation systems, or be that with the listening/collecting tools described in the prior section.

The list above covered “traditional web analytics,” but that field is expanding. A/B and multivariate testing tools fall into this category, as they “watch” with a very specific set of options for optimizing a specific aspect of the site. Optimost, Omniture Test&Target, and Google Website Optimizer all fall into this subcategory.

And, entire companies have popped up to fill specific niches with which traditional web analytics tools have struggled. My favorite example there is Clearsaleing, which uses technology very similar to all of the web analytics tools to capture data, but whose tools are built specifically to provide a meaningful view into campaign performance across multiple touchpoints and multiple channels. The niche their tool fills is improved “attribution management” — there’s even been a Forrester Wave devoted entirely to tools that try to do that (registration required to download the report from Clearsaleing’s site).

Primary Research

At this point on the spectrum, we’re talking about tools and techniques for collecting very specific data from consumers — going in with a set of questions that you are trying to get answered. Focus groups, phone surveys, and usability testing all fall in this area, as well as a plethora of online survey tools. Specifically, there are online survey tools designed to work with your web site — Foresee Results and iPerceptions 4Q are two that are solid for different reasons, but the list of tools in that space outnumbers even the list of online listening platforms.

The challenge with primary research is that you have to make the user aware that you are collecting information for the purpose of research and analysis. That drops a fly in the data ointment, because it is very easy to bias that data by not constructing the questions and the environment correctly. Even with a poorly designed survey, you will collect some powerful data — the problem is that the data may be misleading!

Transaction Data

Beyond even primary research is the terminus of the spectrum — it’s customer data that you collect every day as a byproduct of running your business and interacting with customers. Whenever a customer interacts with your call center or makes a purchase on your web site, they are generating data as an artifact. When you send an e-mail to your database, you’ve generated data as to whom you sent the message…and many e-mail tools also track who opened and clicked through on the e-mail. This data can be very useful, but, to be useful, it needs to be captured, cleansed, and stored in a way that sets it up for useful analysis. There’s an entire industry built around customer data management, and most of what the tools and processes in that industry focus on is transaction data.

What’s Missing?

As much as I would like to wrap up this post by congratulating myself on providing an all-encompassing framework…I can’t. While there are a lot of specific tools/niches that I haven’t listed here that I could fit somewhere on the spectrum of tools as I’ve described it, there are also sources of valuable data that don’t fit in this framework. One type that jumps out to me is marketing mix-type data and tools (think Analytic Partners, ThinkVine, or MarketShare Partners). I’m sure there are many other types. Nevertheless, it seems like a worthwhile framework to have when it comes to building up a portfolio of data sources. Are you getting data from across the entire spectrum (there are free or near-free tools at every point on the spectrum)? Are you getting redundant data?

What do you think? Is it possible to organize “all data sources for marketers” in a meaningful way? Is there value in doing so?

How Succinctly Can I Explain Why Pie Charts Are Evil?

Wednesday, December 2nd, 2009 by Tim Wilson 5 Comments

I’m right at three months into my new gig, and, around the office, probably the most commonly known fact is, “He hates pie charts.” It’s not that I’ve exactly been standing at the elevator handing out leaflets explaining why pie charts are evil, but I have, apparently, chosen a couple of particularly public venues to make a mild statement or two. And, the quasi-preplanned visceral groan when some co-workers put up a pie chart might’ve contributed just a teensy bit.

I’ve been put on the spot since then a couple of times to do one of two things:

  • Explain why pie charts are evil, or
  • Agree that one or another particular usage of a pie chart is appropriate

After catching up on some blog reading yesterday morning and seeing an excellent example of pie chart alternatives from Jon Peltier, and then watching seven presentations yesterday, six of which used the same basic presentation template, and five of which stuck with a pie chart for the sole non-text slide in the presentation, how could I not write another post?! Let’s see how succinct I can make it (don’t hold your breath that you could read the whole thing before exhaling!).

Yes, There is ONE Thing That a Pie Chart Does Well

This kills me, because there’s one way, in a a very narrow set of circumstances, that pie charts do marginally better than alternatives. All THREE of the following criteria have to be met for this to be the case:

  • Exactly 2 or 3 categories that make up the “whole”
  • A fairly significant difference in % makeup for each of the categories
  • Plenty of space available to present the information

99 times out of 100 when pie charts get used, all of these criteria are not met. But, there, I’ve admitted that there is a situation where pie charts are appropriate.

Of course, mullets are an appropriate hairstyle if you are prone to both warm ears and spontaneous hair donations…but that doesn’t mean I’m going to sport one!

Of Course, We Must Start with a Before/After Example

With only the category names changed, below is one of the pie charts I saw yesterday:

Pie Chart Example

In my experience, a simple horizontal bar chart is a better option (among a variety of better options):

Bar Chart Example

Why is this a better option? Oh, let me count the ways…

1. Rainbows Are Good in Princess Tales — Not in Data Visualization

When it comes to data visualization, a chart that doesn’t rely on multiple colors always trumps a chart that does. Four reasons:

  • If you use subtle/muted colors, you can’t get past 4 or 5 categories before you are asking the person reading the chart to work hard to distinguish between subtle shading differences
  • If you use bright/high-contrast colors, you’re asking your user to put on sunglasses to keep from wincing at the visual overkill
  • Roughly 10% of men suffer from some form of color-blindness — it’s darn tricky to nail a palette with more than a small handful of colors that works across the various types of the condition (of course, if you’ve got a secret agenda to have women take over the world, this is one way to contribute, as color blindness is exceedingly rare in women)
  • Maybe you’re presenting your chart in glorious, projected color…but are you sure no one is going to try to print it in black-and-white?

These are all issues with any pie chart that has more than 3 categories. None of these are an issue with a horizontal bar chart.

2. Labels, Labels, Labels

If you’ve every constructed a pie chart in Excel, you’ve run into the challenge of trying to get all of the wedges labeled right there on the chart. Excel continues to make odd choices as to where to wrap text in pie charts, and the circular nature of the whole layout means some wedges have plenty of horizontal labeling room, while others have almost none. You’ve tried some (or all) of the following:

  • Using leader lines for some of the wedges so you can label the most troubling wedges somewhere more spacious
  • Abbreviating the category names
  • Strategically rotating the chart so that the labeling all happens to work (it never does)
  • Rearranging the underlying data so that the pie wedges occur in a different order (which also never works)

After fiddling with the above, you finally break down and yank the labels from the chart and just use a legend. This is bad, bad, BAD! Scroll back up to the pie chart example above and pretend you’re actually trying to interpret the data, but pay attention to how many times you look back and forth between the legend and the pie. This is putting a totally unnecessary strain on your brain! Take a look at the horizontal bar chart — no jumping back and forth needed!

With a horizontal bar chart, the label sits right next to the data, and it doesn’t need to be abbreviated to do so (this is one reason that I find horizontal bar charts to be better than vertical column charts in many cases — with a horizontal orientation, the labels have more width with which to work).

3. Those Pesky Near-Zero Values

Pie charts suck at the small percentages. Small percentage categories wreak havoc on the labeling issue, for sure, but they’re also nearly impossible to compare to each other. In the example above, the smallest percentage is 3%, and that’s almost manageable. But, heaven forbid you have a couple of pesky sub-one-percent categories, and you’re looking at wedges that look suspiciously like the lines between wedges.

4. Seeing Small Differences

Fundoogles & Flibbers came in at 3%, while Dracula’s Mickety Micks came in at 5%. Do the wedge sizes really look different? That’s a fundamental challenge with pie charts — we don’t do a very good job of comparing the areas of these odd sorta-triangular-but-with-one-curved-side shapes. In the case of the bar chart, all you have to compare is lengths — much easier.

5. Economy (of Space) Is a Virtue

Check out the overall size of the charts. While they have the same font size, the same text displayed, and the same width, the bar chart is 20% shorter…and it could have been shorter still! Bar charts are more efficient space-wise. With pie charts, and largely because of the other issues listed above, it’s often necessary to make the chart larger and larger to make it readable.

Of Course, This Exampel Was At Least Flat

This post would be twice as long if I went into the additional issues of using the “3D effect” version of the pie chart.

[Update] Always Room for Improvement

Of course, the danger of posting a “here’s a better way” is that you leave yourself open for suggestions as to how the better way can be improved! See Naomi’s comment below. She raises a good point — basically, that I didn’t do a great job of heeding the data-pixel ratio with my bar chart! So, below is a revised version.

bar chart exampleIn a subsequent email exchange, Naomi made the case for keeping the x-axis and the numbers, but simply removing the “%” signs entirely and putting the word “Percent” in the axis label:

Bar Chart Example

Her main point is that numbers can be read more easily if they are not cluttered with symbols like dollar signs and percent signs. And, her case for keeping the gridlines and labeled axis is that it helps show that the bars are drawn to scale — there hasn’t been any incorrect or misleading scaling (intentional or not — in the same spate of presentations that spurred this post, there was a bar chart with an accompanying table of data…and one of the bars was clearly not accurate).

I’m partial to the version with all of the lines removed, but, at this point, the debate is at a much healthier level than “pie vs. bar,” so I’m happy!