Analyzing Twitter — Practical Analysis

By on in , , , , with 3 Comments

In my last post, I grabbed tweets with the “#emetrics” hashtag and did some analysis on them. One of the comments on that post asked what social tools I use for analysis — paid and free. Getting a bit more focussed than that, I thought it might be interesting to write up what free tools I use for Twitter analysis. There are lots of posts on “Twitter tools,” and I’ve spent more time than I like to admit sifting through them and trying to find ones that give me information I can really use. This, in some ways, is another one of those posts, except I’m going to provide a short list of tools I actually do use on a regular basis and how and why I use them.

What Kind of Analysis Are We Talking About?

I’m primarily focussed on the measurement and analysis of consumer brands on Twitter rather than on the measurement of one’s personal brand (e.g., @tgwilson). While there is some overlap, there are some things that make these fundamentally different. With that in mind, there are really three different lenses through which Twitter can be viewed, and they’re all important:

  • The brand’s Twitter account(s) — this is analysis of followers, lists, replies, retweets, and overall tweet reach
  • References of the brand or a campaign on Twitter — not necessarily mentions of @<brand>, but references to the brand in tweet content
  • References to specific topics that are relevant to the brand as a way to connect with consumers — at Resource Interactive, we call this a “shared passion,” and the nature of Twitter makes this particularly messy, but, to whatever level it’s feasible, it’s worth doing

While all three of these areas can also be applied in a competitor analysis, this is the only mention (almost) I’m going to make of that  — some of the techniques described here make sense and some don’t when it comes to analyzing the competition.

And, one final note to qualify the rest of this post: this is not about “online listening” in the sense that it’s not really about identifying specific tweets that need a timely response (or a timely retweet). It’s much more about ways to gain visibility into what is going on in Twitter that is relevant to the brand, as well as whether the time spent investing in Twitter is providing meaningful results. Online listening tools can play a part in that…but we’ll cover that later in this post.

Capturing Tweets?

When it comes to Twitter analysis, it’s hard to get too far without having a nice little repository of tweets themselves.  Unfortunately, Twitter has never made an endless history of tweets available for mining (or available for anything, for that matter). And, while the Library of Congress is archiving tweets, as far as I know, they haven’t opened up an API to allow analysts to mine them. On top of that, there are various limits to how often and how much data can be pulled in at one time through the Twitter API. As a consumer, I suppose I have to like that there are these limitations. As a data guy, it gets a little frustrating.

Two options that I’ve at least looked at or heard about on this front…but haven’t really cracked:

  • Twapper Keeper — this is a free service for setting up a tweet archive based on a hashtag, a search, or a specific user. In theory, it’s great. But, when I used it for my eMetrics tweet analysis, I stumbled into some kinks — the file download format is .tar (which just means you have to have a utility that can uncompress that format), and the date format changed throughout the data, so getting all of the tweets’ dates readable took some heavy string manipulation
  • R — this is an open source statistics package, and I talked to a fellow several months ago who had used it to hook into Twitter data and do some pretty intriguing stuff. I downloaded it and poked around in the documentation a bit…but didn’t make it much farther than that

I also looked into just pulling Tweets directly into Excel or Access through a web query. It looks like I was a little late for that — Chandoo documented how to use Excel as a Twitter client, but then reportd that Twitter made a change that means that approach no longer works as of September 2010.

So, for now, the best way I’ve found to reliably capture tweets for analysis is with RSS and Microsoft Outlook:

  1. Perform a search for the twitter username, a keyword, or a hashtag from (or, if you just want to archive tweets for a specific user, just go to the user’s Twitter page)
  2. Copy the URL for the RSS for the search (or the user)
  3. Add a new RSS feed in MS Outlook and paste in the URL

From that point forward, assuming Outlook is updating periodically, the RSS feeds will all be captured.

There’s one more little trick: customize the view to make it more Excel/export-friendly. In Outlook 2007, go to View » Current View » Customize Current View » Fields. I typically remove everything except From, Subject, and Received. Then go to View » Current View » Format Columns and change the Received column format from Best Fit to the dd-Mmm-yy format. Finally, remove the grouping. This gives you a nice, flat view of the data. You can then simply select all the tweets you’re interested in, press <Ctrl>-<C>, and then paste them straight into Excel.

I haven’t tried this with hundreds of thousands of tweets, but it’s worked great for targeted searches where there are several thousand tweets.

Total Tweets, Replies, Retweets

While replies and retweets certainly aren’t enough to give you the ultimate ROI of your Twitter presence, they’re completely valid measures of whether you are engaging your followers (and, potentially, their followers). Setting up an RSS feed as described above based on a search for the Twitter username (without the “@”) will pick up both all tweets by that account as well as all tweets that reference that account.

It’s then a pretty straightforward exercise to add columns to a spreadsheet to classify tweets any number of ways by some use of the IF, ISERROR, and FIND functions. These can be used to quickly flag each tweet  as a reply, a retweet, a tweet by the brand, or any mix of things:

  • Tweet by the brand — the “From” value is the brand’s Twitter username
  • Retweet — tweet contains the string “RT @<username>
  • Reply — tweet is not a retweet and contains the string “@<username>

Depending on how you’re looking at the data, you can add a column to roll up the date — changing the tweet date to be the tweet week (e.g., all tweets from 10/17/2010 to 10/23/2010 get given a date of 10/17/2010) or the tweet month. To convert a date into the appropriate week (assuming you want the week to start on Sunday):


To convert the date to the appropriate month (the first day of the month):


C1, of course, is the cell with the tweet date.

Then, a pivot table or two later, and you have trendable counts for each of these classifications.

This same basic technique can be used with other RSS feeds and altered formulas to track competitor mentions, mentions of the brand (which may not match the brand’s Twitter username exactly), mention of specific products, etc.

Followers and Lists

Like replies and retweets, simply counting the number of followers you have isn’t a direct measure of business impact, but it is a measure of whether consumers are sufficiently engaged with your brand. Unfortunately, there are not exactly great options for tracking net follower growth over time. The “best” two options I’ve used:

  • Twitter Counter — this site provides historical counts of followers…but the changes in that historical data tend to be suspiciously evenly distributed. It’s better than nothing if you don’t have a time machine handy. (See the Twitalyzer note at the end of this post — I may be changing tools for this soon!)
  • Check the account manually — getting into a rhythm of just checking an account’s total followers is the best way I’ve found to accurately track total followers over time; in theory a script could be written and scheduled that would automatically check this on a recurring basis, but that’s not something I’ve tackled

I also like to check lists and keep track of how many lists the Twitter account is included on. This is a measure, in my mind, of whether followers of the account are sufficiently interested in the brand or the content that they want to carve it off into a subset of their total followers so they are less likely to miss those tweets and/or because they see the Twitter stream as being part of a particular “set of experts.” Twitalyzer looks like it trends list membership over time, but, since I just discovered that it now does that, I can’t stand up and say, “I use that!” I may very well start!

Referrals to the Brand’s Site

This doesn’t always apply, but, if the account represents a brand, and the brand has a web site where the consumer can meaningfully engage with the brand in some way, then measuring referrals from Twitter to the site are a measure of whether Twitter is a meaningful traffic driver. There are fundamentally two types of referrals here:

  • Referrals from tweeted links by the brand’s Twitter account that refer back to the site — these can be tracked by a short URL (such as, by adding campaign tracking parameters to the URL so the site’s web analytics tool can identify the traffic as a brand-triggered Twitter referral, or both. The campaign tracking is what is key, because it enables measuring more than simply “clicks:” whether the visitors are first-time visitors to the site or returning visitors, how deeply they engaged with the site, and whether they took any meaningful action (conversions) on the site
  • “Organic” referrals — overall referrals to the site from Depending on which web analytics tool you are using on your site, this may or may not include the clickthroughs from links tweeted by the brand.

By looking at referral traffic, you can measure both the volume of traffic to the site and the relative quality of the traffic when compared to other referral sources for the site.

(If the volume of that traffic is sufficiently high to warrant the effort, you may even consider targeting content on the landing page(s) for Twitter referral traffic to try to engage visitors more effectively– you know the visitor is engaged with social media, so why not test some secondary content on the page to see if you can use that knowledge to deliver more relevant content and CTAs?)

Word Clouds with Wordle

While this isn’t a technique for performance management, it’s hard to resist the opportunity to do a qualitative assessment of the tweets to look for any emerging or hot topics that warrant further investigation. Because all of the tweets have been captured, a word cloud can be interesting (see my eMetrics post for an example). Hands-down, Wordle makes the nicest word clouds out there. I just wish it was easier to save and re-use configuration settings.

One note here: you don’t want to just take all of the tweet content and drop it straight into Wordle, as the search criteria you used for the tweets will dwarf all of the other words. If you first drop the tweets into Word, you can then do a series of search and replaces (which you can record as a macro if you’re going to repeat the analysis over time) — replace the search terms, “RT,” and any other terms that you know will be dominant-but-not-interesting with blanks.

Not Exactly the Holy Grail…

Do all of these techniques, when appropriately combined, provide near-perfect measurement of Twitter? Absolutely not. Not even close. But, they’re cheap, they do have meaning, and they beat the tar out of not measuring at all. If I had to pick one tool that I was going to bet on that I’d be using inside of six months for more comprehensive performance measurement of Twitter, it would be Twitalyzer. It sure looks like it’s come a long way in the 6-9 months since I last gave it a look. What it does now that it didn’t do initially:

  • Offers a much larger set of measures — you can pick and choose which measures make sense for your Twitter strategy
  • Provides clear definitions of how each metric is calculated (less obfuscated than the definitions used by Klout)
  • Allows trending of the metrics (including Lists and Followers).

Twitalyzer, like Klout, and Twitter Counter and countless other tools, is centered on the Twitter account itself. As I’ve described here, there is more going on in Twitter that matters to your brand than just direct engagement with your Twitter account and the social graph of your followers. Online listening tools such as Nielsen Buzzmetrics can provide keyword-based monitoring of Twitter for brand mentions and sentiment — this is not online listening per se, really, but it is using online listening tools for measurement.

For the foreseeable future, “measuring Twitter” is going to require a mix of tools. As long as the mix and metrics are grounded in clear objectives and meaningful measures, that’s okay. Isn’t it?

Similar Posts:


  1. Nice post Tim. Thanks for wading through many of the tools for the rest of us. I think the landscape you’ve covered here isn’t as fragmented as the listening space so it may be an easier thing to tackle and ultimately deliver to analysis consumers.

    However, I am surprised that none of these tools seem to have made it into the portfolios of the primary web analytics vendors. I’m not sure what the integration would look like, but those guys always seem to come up with something.


  2. Tim, this is an impressive discussion of how to analyze tweets – very thorough and helpful. Thank you for putting this together!

    Have you looked at TweetReach or RowFeeder for other Twitter analytics options? [Full disclosure: I’m co-founder of Appozite, the company that makes TweetReach.] They each provide some of the metrics and features you talk about in this post, and could be useful additions to a Twitter analysis toolkit. I’d love to hear your thoughts about them.

    Again, thanks for such a comprensive writeup.

Leave your Comment

« »