All Web Analytics Tools Are the Same (at least when it comes to data capture)

By on in with 15 Comments

I started to write a post on using web analytics tools — Google Analytics, specifically, but with a nod to Webtrends as well — to track traffic to custom tabs and interactive elements on Facebook pages. But, as I started thinking through that content, I realized that I needed to back up and make sure I had a good, clean explanation of a key aspect of the mechanics of page tag-based web analytics tools. I poked around on the interweb a bit and found some quick explanations that were accurate, but that really weren’t as detailed as I was hoping to find.

Regardless of whether you’re trying to track Facebook or not, it’s worth having a good, solid understanding of these underlying mechanics:

  • If you’re a web analyst, understanding this is like understanding gravity if you’re a human being — there are some immutable laws of the internet, and knowing how those laws drive the data you are seeing will open up new possibilities for capturing activity on your site
  • If you’re a developer, then this will be a quick read, but understanding it will make you the hero to both your web analysts and (assuming they’re not glory hogs) the people they support with their analysis, because you will be able to suggest some clever ways to capture useful information

By the end of this post, you should understand both the title and why the URLs I listed below are what make it so:

  • Google Analytics = http://www.google-analytics.com/__utm.gif
  • Webtrends = http://statse.webtrendslive.com/<ID>/dcs.gif
  • Sitecatalyst = https://<custom domain>/b/ss/<account name>/1/<code version>/<random ID>
  • Coremetrics = http://<custom domain>/cm or http://<custom domain>/eluminate

I’ve been deep under the hood with both Google Analytics and Webtrends for this, but the same principles apply to all tools (because they’re all bounded by the Physics of the Internet). I’m going to talk about Google Analytics the most in-depth, because it has the largest market share (measured by number of sites tagged with it), and I’ll try to call out key differences when appropriate.

Let’s start with a simple picture of how all of these tools work. When a visitor comes to a page on your site, the following sequence of events happens:

Steps 2 and 3 are really the crux of the biscuit, but we need to make sure we’re all clear on the first step, too, before getting to the fun there.

1 – Javascript figures out stuff about the visitor

We all know what Javascript is, right? It’s one of the key languages that can be interpreted by a web browser so that web pages aren’t just static text and images: dropdown menus, mouseovers, and such. But, Javascript also enables some things to go on behind the scenes. The basic data capture method for any tag-based web analytics tool is to run Javascript to determine what page the visitor is on, what relevant cookies are set on the user’s machine, whether the visitor has been to the site before, what browser the visitor is using, what language encoding is set for the browser, the user’s screen resolution, and a slew of other fairly innocuous details. This happens every time a visitor views a page running the page tag. So, great — a visitor has viewed a page, and the Javascript has figured out a bunch of details about the visitor and the page. Now what? It’s on to step 2!

(I realize I’m saying “Javascript” here, and most tools also have Actionscript support for tracking activity within Flash — for the purposes of this post, I’m just going to stick with Javascript, but I’ll get back to Actionscript in my next post!)

2 – Javascript packages that info into a single string of information

The next step is pretty simple, but it’s where the magic starts to happen. Let’s say the Javascript in step 1 had figured out the following information about a visitor to a page:

  • Site = www.gilliganondata.com
  • Page title = The Fun of Facebook Measurement
  • Page URL = /index.php/2010/01/11/the-fun-of-facebook-measurement/
  • Browser language = en-us

Converting that info into a single string is pretty straightforward. Let’s start by pretending we’re going to put it into a single row in a pipe-delimited file. It would look like this:

Site (hostname) = www.gilliganondata.com | Page name = The Fun of Facebook Measurement | Page URL = /index.php/2010/01/11/the-fun-of-facebook-measurement/ | Browser language = en-us

Now, rather than using the pretty, readable names for each of the four characteristics of the page view, let’s use some variable names (these are the Google Analytics variable names, but the documentation for any web analytics tool will provide their specific variable names for these same things):

  • Site (hostname) –> utmhn
  • Page title –> utmdt
  • Page URL –> utmp
  • Browser language –> utmul

So, now our string looks like:

utmhn = www.gilliganondata.com | utmdt = The Fun of Facebook Measurement | utmp = /index.php/2010/01/11/the-fun-of-facebook-measurement/ | utmul = en-us

We used pipes to separate out the different variables, but there’s nothing really wrong with using something different, is there? Let’s go with using “&” instead and eliminate the spaces around equal signs and the delimiters. The single string now looks like this:

utmhn=www.gilliganondata.com&utmdt=The Fun of Facebook Measurement&utmp=/index.php/2010/01/11/the-fun-of-facebook-measurement/&utmul=en-us

Now, we’ve still got some “special” characters that aren’t going to play nice in the Step 3 — namely spaces and “/”s, so let’s replace those characters with the appropriate URL encoding (%20 for the spaces and %2F for the “/”s):

utmhn=www.gilliganondata.com&utmdt=The%20Fun%20of%20Facebook%20Measurement&utmp=%2Findex.php%2F2010%2F01%2F11%2Fthe-fun-of-facebook-measurement%2F&utmul=en-us

It looks a little messy, but it’s a single, portable string that has the exact information that was listed in the four bullets that started this section. While it might be painful to reverse-engineer this string into a more reader-friendly format by hand, it’s a snap to do programmatically (which is exactly what web analytics tools do…as we’ll discuss in step 4) or in Excel.

Before we move on, let’s tack one more parameter onto our string. This is something that is actually hard-coded into the Javascript, and it identifies which web analytics account this traffic needs to go to. In the case of this blog, that account ID is “UA-2629617-3” and the variable Google Analytics uses to identify the account parameter is “utmac.” I’ll just tack that on the end of our string, which now looks like:

utmhn=www.gilliganondata.com&utmdt=The%20Fun%20of%20Facebook%20Measurement&utmp=%2Findex.php%2F2010%2F01%2F11%2Fthe-fun-of-facebook-measurement%2F&utmul=en-us&utmac=UA-2629617-3

A subtle point: what we’ve really done above is to combine all the information into a single string with a series of “key-value pairs.” In the case of the first variable, the “key” is “utmhn” and the “value” is “www.gilliganondata.com.” Notice that both the key AND the value are included in the string. If you’ve worked with comma-delimited or tab-delimited files, then you might be wondering why the key is included. Why can’t the Javascript always pass in the variables in the same order, and the web analytics server would know that the first value is the hostname, the second value is the title, and so on? There are at least four reasons for this:

  • It just generally makes the process more robust because it reaffirms to the server exactly what each value means at the point the server receives the information; the internet is messy, so hiccups can happen
  • Most “advanced” features when it comes to capturing web analytics data rely on tacking on additional parameters to the master string — by including both the key and the value for every parameter, that fanciness doesn’t have to worry about the order the parameters are passed in, AND it means the custom parameters get viewed/processed exactly the same way that the basic parameters do
  • The “key-value pairs separated by the & sign” are standard on the internet. Go to any online retail site and poke around, and you will see them in the URL. It’s kind of a standard way to transmit a series of variables onto the back end of a web page or image request, and that’s really all that’s going to happen in step 3

We’ve got our string, so now let’s do something with it!

3 – Javascript makes an image request with that string tacked on the end

Somehow, we need to pass that string back to the web analytics server. We do that by making an image call. In the case of Google Analytics that image request is always, always, always exactly the same, no matter the site using Google Analytics:

http://www.google-analytics.com/__utm.gif

Just like we covered in the “online retail site” URL structure discussion at the end of the last section, we’re going to tack some parameters on the end of the __utm.gif request. The standard way to take a base URL and tack on parameters is to add a “?” followed by one or more key-value pairs that are separated by an “&” sign. Lucky for us, the “&” sign is what we used when we were building our string in the last section! So:

http://www.google-analytics.com/__utm.gif

+

?

+

utmhn=www.gilliganondata.com&utmdt=The%20Fun%20of%20Facebook%20Measurement&utmp=%2Findex.php%2F2010%2F01%2F11%2Fthe-fun-of-facebook-measurement%2F&utmul=en-us&utmac=UA-2629617-3

=

http://www.google-analytics.com/__utm.gif?utmhn=www.gilliganondata.com&utmdt=The%20Fun%20of%20Facebook%20Measurement&utmp=%2Findex.php%2F2010%2F01%2F11%2Fthe-fun-of-facebook-measurement%2F&utmul=en-us&utmac=UA-2629617-3

Wow, that looks messy, but it just looks messy — it’s actually quite clean! In reality, there are way more than five parameters tacked onto the image request. As a matter of fact, the request above would really look more like this:

http://www.google-analytics.com/__utm.gif?utmwv=4.6.5&utmn=1516518290&utmhn=www.gilliganondata.com&utmcs=UTF-8&utmsr=1920×1080&utmsc=24-bit&utmul=en-us&utmje=1&utmfl=10.0%20r45&utmdt=The%20Fun%20of%20Facebook%20Measurement%20%7C%20Gilligan%20on%20Data%20by%20Tim%20Wilson&utmhid=1640286085&utmr=http%3A%2F%2Fgilliganondata.com%2F&utmp=%2Findex.php%2F2010%2F01%2F11%2Fthe-fun-of-facebook-measurement%2F&utmac=UA-2629617-3&utmcc=__utma%3D116252048.1573621408.1267294551.1267294551.1267299933.2%3B%2B__utmz%3D116252048.1267294551.1.1.utmcsr%3D(direct)%7Cutmccn%3D(direct)%7Cutmcmd%3D(none)%3B&gaq=1

You can get a complete list of the Google Analytics tracking variables from Google (if you’re really into this, check out the utmcc value — that actually is a single parameter that includes multiple sub-parameters, which are separated by “%3D” — a URL-encoded semicolon — instead of an “&”; these are the user cookie values, which you can find towards the end of the long string above if you look for it). You can inspect the specific calls using any number of tools. I like to use the Firebug plugin for Firefox, but Fiddler is another free tool, and Charles is the standard tool used at my company. And, there’s always WASP to provide the “clean” view of the parameters (I use WASP heavily…unless I’m trying to reverse-engineer the specific calls being made for some reason).

The Javascript makes a request for that URL. This is the infamous “1×1 image.” Just to sharpen the edges a little bit on some common misconceptions about that image request:

  • The request for the image is what matters — while the 1×1 image will get delivered back, by the time www.google-analytics.com actually sends out the image, the page view has already been counted. As a matter of fact, if there was no __utm.gif image, the traffic would still get counted simply by virtue of the fact that the Google Analytics server received the image request. As it happens, some other little user experience hiccups can happen if there’s no actual image, but the existence of the file matters ‘nary at all from a data capture perspective!
  • Yes, you can actually just request the image directly from your browser. Go ahead — here’s the URL as a hyperlink: http://www.google-analytics.com/__utm.gif (yeah, it’s something of a letdown, but now you can say you’ve done it)
  • The image isn’t a 1×1 pixel image so that it’s small and not noticed by the user. If Google got a wild hair to replace the __utm.gif image with a 520×756 pixel image of a psychedelic interpretation of the Mona Lisa…no one would ever see the change (unless they were doing something silly like calling the image directly from their browser as described in the previous bullet). The image gets requested by the Javascript, but it never gets displayed to the user. It’s sort of like a Javascript dropdown menu — the text for the dropdown gets loaded into the browser memory so that, if you mouse over the menu, the text is already there and can be displayed immediately. The __utm.gif request is the same way…except there’s nothing in the Javascript that ever actually tries to render the image to the user

And one more point: While we’ve been talking about “image requests” here, it doesn’t have to be an image request per se. In the case of Google Analytics, it is. In the case of Webtrends, it is, too (the image is called dcs.gif). In the case of other web analytics packages, it’s not necessarily an image request, but it is a request to the web analytics server. What matters is understanding that there are a bunch of key-value pairs tacked on after a “?” in the request, and that’s where all of the fun information about the visit to the page gets recorded and passed.

4 – Web analytics tool reads the string and puts the information into a database

So, the web analytics server has been getting bombarded with the requests from Step 3. Can you see how straightforward it is for software to take those requests and split them back out into their component parts? That’s the easy part. Where the tools really differentiate themselves is how exactly they store all of that data — the design of their database and then how that data is made available for queries and reports by analysts.

Back in the day (and I assume it’s still an option), Webtrends would make the raw log files available to their customers as an add-on service. That was handy — once we understood the basics of this post and the Webtrends query parameters, we were able to sift through for some juicy nuggets to supplement our “traditional” web analytics (these were in the days before Webtrends had their “warehouse” solution, which would have made the same information available).

5 – Web analyst queries the database for insights

Like step 4, this is an area where web analytics tools really differentiate themselves. In the case of Google Analytics, there is the web-based tool and the API. In the case of paid, enterprise-class tools, there are similar tools plus true data warehouse environments that allow much more granular detail, as well as two-way integration with other systems.

Why Understanding This Matters

You’re still reading, so maybe I should have made this case earlier. But, the reason this matters is because, once you understand these mechanics, you can start to do some fun things to handle unique situations. For instance, what do you do if you have Google Analytics, and you want to track activity somewhere where Javascript won’t run (like…um…your Facebook fan page — that’ll be my next post!). Or, more generally, if you’re Googling around looking for ways to address some sort of one-off tracking need, you’ll understand the explanations that you’re finding — these solutions invariably involve twiddling around within the framework described here.

As I read back through this post before publishing it, I was struck by how far into the tactical mechanics of web analytics it is. The overwhelming majority of web analytics blog posts focus on step 5 and beyond — how to use the data to be an analysis ninja rather than a report monkey. Understanding the mechanics described here is a foundational step that will support all of that analysis work. I was incredibly fortunate, early in my web analytics career, to have an opportunity to run the migration from a log-based web analytics package to a tag-based solution. I was triply fortunate that I worked on that migration with two brilliant and patient IT folk: Ernest Mueller as the web admin (who regularly shares his knowledge these days as a contributor to http://www.webadminblog.com/) supporting the effort, and Ryan Rutan, the developer supporting the effort — he was hacking the Webtrends page tag before the consultant who we had on-site to help implement it had finished his first day. Ernest drew countless whiteboard diagrams to explain to me “how the internet works” (those “immutable laws” I mentioned early in this post), while Ryan repeated himself again and again until I understood this whole “image request with parameters” paradigm.

If you’re a web analyst, seek out these types of people in IT. A hearty collaboration of cross-discipline skills can yield powerful results and be a lot of fun. I had similar collaborations when I worked at Bulldog Solutions, and the last two weeks saw the same thing happening at my current gig at Resource Interactive. Those are pretty energizing experiences that leave me scratching my head as to why so many companies wind up with an adversarial relationship between “the business” and “IT.” But THAT is a topic for a whoooollllle other post that I may never write…

Similar Posts:

15 Comments


  1. Pingback Web Analytics Tracking on a Facebook Page | Gilligan on Data by Tim Wilson

  2. I would suggest give Clicktale a try. I’ve been using it for 2 months and it is neat to watch what your users do, I learned a lot.

  3. Nice summary

    You might like to have a look at the opensource jsHub project:

    http://jshub.org/

    which is attempting to deal with core problem of making all this stuff easier to understand and visible to the Web Analyst.

    A key requirement of any web analytics product is getting accurate data into it in the first place, and there are so many things that can be misunderstood by a developer during the technical implementation that result in the permanent loss of data.

  4. Required reading for anyone serious about website measurement and analysis. Great job breaking it into it’s component pieces and explaining the value of understanding. Next assignment to folks reading: download Charles proxy and start watching these js calls and pixel requests happen in real time to get a sense for what’s being information is being passed by each site you visit.

  5. Thanks for sharing this group of lesson learned, you have added a few points that I need to go away and consider.

  6. Great summary, you make a crusty old Web Admin proud 🙂

    The other important thing to note about this method is that it’s a little lossy. We spent months trying to figure out why when we first implemented page tagging, moving from our older log-based model, that we seemed to be missing visits. I spent a lot of time comparing our access logs to the raw page tag log from the supplier and in the end, for mysterious reasons (and believe me I know and accounted for all the reasons page tag results are *supposed* to be different from log results), page tagging was dropping about 5% of our visits. (Page tags got some visits logging didn’t, but lost a bunch logging did, and this is the net difference.) In the end, the supplier admitted “Yeah… That happens.”

    For marketing purposes, that’s usually just fine. But I think it’s important to note that what you see from page tagging isn’t 100% of what’s actually happening, so don’t make logic mistakes based on it (e.g. “I’m sure no one from Company X has looked at our Web site, because it doesn’t show up in our page tag based analytics.”).

  7. Pingback link list for march 2010 « Mixotricha

  8. Hey Tim, great article.
    We´re currently working on an analytics tool that works a bit different and is suited for custom usecases: the user just sends the data thats intersting to him and he doesn´t need to use JavaScript in this case.

    If you want to have a look, just check my website.

  9. Pingback Gilligan’s eMetrics Recap — Washington, D.C. 2010 | Gilligan on Data by Tim Wilson

  10. Pingback Web Analytics tools Comparison -- Sitecatalyst, Google Analytics, Webtrends, Coremetrics | Gilligan on Data by Tim Wilson

  11. Pingback Web Analytics (How It Works) Explained in 4 Minutes | Gilligan on Data by Tim Wilson

  12. Pingback Site index

  13. Pingback Chelsea Jersey 2011 2012

  14. Pingback business directory

Leave your Comment


« »