Stephen Few’s Derivation of Tufte: The Data-Pixel Ratio

I’ve glanced through various folks’ copies of Stephen Few’s Information Dashboard Design on several occasions over the past few years. And, it was a heavy influence on the work that an ad hoc team in the BI department at National Instruments undertook a couple of years ago to standardize/professionalize the work they were putting out.

I finally got around to reading a good chunk of the book as I was flying a three-legged trip out to British Columbia last week…and it is good! One section that particularly struck me started on page 100:

Edward R. Tufte introduced a concept in his 1983 classic The Visual Display of Quantitative Information that he calls the “data-ink ratio.” When quantitative data is displayed in printed form, some of the ink that appears on the page presents data, and some presents visual content that is not data.
:

He then applies it as a principle of design: “Maximize the data-ink ratio, within reason. Every bit of ink on a graphic requires a reason. And nearly always that reason should be that the ink presents new information.”
:
This principle applies perfectly to the design of dashboards, with one simple revision: because dashboards are always displayed on computer screens, i’ve changed the work “ink” to “pixels.”

I’ll actually go farther and say that “dashboards” can be replaced with “spreadsheets” and this maxim holds true. Taking some sample data straight from Few’s book, and working with a simple table, below is how at least 50% of Excel users would format a simple table with bookings by geographic region:

Look familiar? The light gray gridlines in the background turned on in Excel by default. And, a failure to resist the urge to put a “thin” grid around the entire data set.

Contrast that with how Few represents the same data:

Do you agree? This is clearly an improvement, and all Few really did was remove the unnecessary non-data pixels.

So, how would I have actually formatted the table? It’s tough to resist the urge to add color, and I am a fan of alternating shaded rows, which I can add with a single button click based on a macro that adds conditional formatting (”=MOD(ROW()+1,2)=0″ for shaded and “=MOD(ROW(),2)=0″ for not shaded):

In this case…I’d actually vote for Few’s approach. But, even Few gives the okay to lightly shaded alternative rows later in the same chapter, when some sort of visual aid is needed to follow a row across a large set of data. That’s really not necessary in this case. And, does bolding the totals really add anything? I don’t know that it does.

The book is a great read. It’s easy to dismiss the topic as inconsequential — the data is the data, and as long as it’s presented accurately, does it really matter if it’s presented effectively? In my book, it absolutely does matter. The more effectively the data is presented, the less work the consumer of the data needs to do to understand it. The human brain, while a wondrously effective computer, has its limits, and presenting data effectively allows the brain to spend the bulk of its effort on assessing the information rather than trying to understand the data.

The “Action Dashboard” — Avinash Mounts My Favorite Soapbox

Avinash Kaushik has a great post today titled The “Action Dashboard” (An Alternative to Crappy Dashboards. As usual, Avinash is spot-on with his observations about how to make data truly useful. He provides a pretty interesting 4-quadrant dashboard framework (as a transitional step to an even more powerful dashboard). I’ve gotten red in the face more times than I care to count when it comes to trying to get some of the concepts he presents across. It’s a slow process that requires quite a bit of patience. For a more complete take on my thoughts check out my post over on the Bulldog Solutions blog.

And, yes, I’m posting here and pointing to another post that I wrote on a completely different blog. We’ve recently re-launched the Bulldog Solutions blog — new platform, and, we hope, with a more focussed purpose and strategy. What I haven’t fully worked out yet is how to determine when to post here and when to post there…and when to post here AND there (like this post).

It may be that we find out that we’re not quite as ready to be as transparent as we ought to be over on the corporate blog, in which case this blog may get some posts that are more “my fringe opinion” than will fly on the corporate blog. I don’t know. We’ll see. I know I’m not the first person to face the challenge of contributing to multiple blogs (I’ve also got my wife’s and my personal blog…but that one’s pretty easy to carve off).

Oh. So THAT’s What Hans Rosling Is Doing at Google…

Yep. I’m living under a rock.

I’d re-stumbled across Hans Rosling and Trendalyzer a couple of months ago. I made a comment regarding if Trendalyzer hits the business world. Well, in a way, it sort of has. It’s hanging around under the hood in some fashion, I’m almost sure, of Google’s Visualization API.

Must. Find. Time. To. Play. With. Google Spreadsheets and visualization gadgets.

Zuckerberg/Lacy — a Technical (Data) Twitter Analysis

At the top list of blogs I follow is Jeremiah Owyang’s Web Strategist blog. He posts frequently, with depth, and with insight. However, I was in the midst of a hectic week in Austin when he posted his Analysis of the Zuckerberg Lacy Interview, and, frankly, while the title persisted in a couple of places (not only in his feed, but in my Yahoo! Pipe feed on data from non-data blogs because of the word “analysis” in the title), I’d been pretty much Zuckerberg-Lacy’d out, and I thought this was going to be a “my take on what happened” type of “analysis.” I was wrong.

The one-sentence recap of the actual incident: Sarah Lacy, of Business Week, interviewed Mark Zuckerberg, founder and CEO of Facebook, at SXSW Interactive in Austin, and the crowd, which was none too happy with the way the interview was going, pretty much took over using Twitter as a backchannel of communication. Google it and you can read scads more as well as see video of the event.

Zuckerberg Lacy Twitter Chart Well, I finally got around to looking at Jeremiah’s post, and I’ll be damned if it wasn’t a brief post linking to a really interesting TechnoSocial post: Anatomy of a Mob: The Lacy/Zuckerberg Interview. Kee Hinckley sifted through a bunch of Twitter data to try to get some insight into what really went on through Twitter during the keynote. It’s a fascinating read.

What I want to point out, though, is not so much the results of the analysis, but some pretty darn noteworthy aspects of what went into it.

First, I immediately started wondering how on earth Hinckley figured out which Twitter users were at the keynote. As it turns out, he explains it — recognizing that it’s imperfect, but, by golly, still pretty clever! And, it took a mix of tools, some level of clunky automation (no one likes to do screen scraping), and quite a bit of flat-out manual effort. He revised what he included/excluded as he got into the manual part of the exercise. What’s Noteworthy: the data Hinckley wanted was not easily accessible (the data always requires more prep work than most people realize), and it required some judgment when it came to getting it. That’s stepping out of a formulaic approach to analysis of “pull the data that’s available and present it.”

Second, the visualization. I am almost always opposed to 3D representations of data. Categorically when it’s two dimensions of data presented with “depth” — that’s just silly. But, even when it’s three dimensions presented in three dimensions, more often than not, the result is uninterpretable. Not the case here! Hincklely steps outside of the box to think about ways to effectively visualize the data — much, much more thought than simply “How do I get all of the data displayed?” He even includes two different charts, one with bubbles and one with a color spectrum, of the same data — clearly grappling with how best to show the information clearly (both work, IMHO). What’s Noteworthy: All too often, I see analysts go through all of the hurdles of prepping the data and “running the numbers” only to take shortcuts when it comes to the visual representation of the results. That’s the equivalent of running 25.5 miles of a marathon really hard…and then going home.

Finally, Hinckley puts a lot of text-based interpretation behind his analysis. In this case, he clearly had the question, ran with trying to find the answer, and took responsibility for explaining the whole process and the results. And, he did all three swimmingly! What’s Noteworthy: In many situations, one person is asking the question, while an analyst tries to find the answer. It’s that third area — explaining the process and results — where many analysts decide not to tread. Rather, they “do the analysis” and turn the “results” (mostly the data, including charts) over to the original requestor to interpret and explain. This bothers me. I much prefer to see an analyst actually draw conclusions and provide real context and interpretation. Whether they are expected to or not! It’s up to the original requestor to decide whether to use that information and how. More often than not, it gets used. To good results.

Overall, it’s a fascinating read. Top. Notch. Work!

Sometimes, the Data DOES Paint a Clear Picture

I’ll admit right up front that this is the least value-add post on this blog to date. Part of me sincerely hopes that it holds that distinction indefinitely. But, I know me better than that, so no promises.

We all have them. Those moments where someone says something — in person, in an e-mail, in an instant message — that triggers a completely random, but oddly inspired, response.

What happened: One of my pet peeves is the cliche, “If you can’t measure it, don’t do it.” It sounds good, but I challenge any company to fully apply this overly simplistic maxim and survive. I’m all for having a bias towards measurement, but I get nervous when people speak in absolutes like this.

Earlier this week, I fired off an internal e-mail proposing an initiative that was extremely low cost that seemed like a good idea to me. It really wasn’t an initiative where it made sense to try to quantify the benefits, though. I made a comment as such in the e-mail — that, despite it not being practical to measure the results, I still thought it was a good idea. (I was having one of the 15-20 snarky moments I have throughout any given day.) Two of the five people on the distribution list immediately responded with demands for an ROI estimate.

FLASH!

10 minutes later, and I’d fashioned the following chart in Excel and responded to the group with my analysis:

The Bird

Everyone had a good chuckle. 

Here’s the spreadsheet file itself. It’s as clean as clean can be, so feel free to snag it and put it to your own use. If you put it to use with entertaining results, I’d appreciate a quick comment with the tale. Or, if you make modifications to enhance the end result, I’d love to get a copy.

Enjoy.

Depth vs. Breadth, Data Presentation vs. Absorption, Frank and Bernanke

For anyone who knows me or follows this blog, it will be no surprise that I can get a bit…er…animated when it comes to data visualization. Partly, this may be from my background in Art and Design. I got out of that world as quickly as possible, when I realized that I lacked the underlying wiring to really do visual design well.

As a professional data practitioner, I also see effective data visualization as being a way to manage the paradox of business data: the world of business is increasingly complex, yet the human brain is only able to comprehend a finite level of complexity. And, while I love to bury myself up to my elbows in complex systems and processes, I’m the first person to admit that my eyes glaze over when I’m presented with a detailed balance sheet (sorry, Andy). A picture is worth a thousand words. A chart is worth a thousand data points. That’s how we interpret data most effectively — by aggregating and summarizing it in a picture.

So, it’s pretty important that the picture be “drawn” effectively. I had a boss for a year or two who flat-out was much closer to Stephen Hawking-ish than he was to Homer Simpson when it came to raw brainpower. He took over the management of a 50-person group, and promptly called the whole group together and presented slide after slide of data that “clearly showed”…something or other. The presentation has become semi-legendary for those of us who witnessed it. The fellow was facing a room of blank-confused-bored-bewildered gazes by the time he hit his third slide. Now, to his credit, he learned from the experience. He still looks at fairly raw data…but he’s careful as to how and where he shares it.

All that is a lengthy preamble to a Presentation Zen post I read this evening about Depth vs. Breadth of presentations. It’s a simple concept (meaning I can understand it), with some pretty good, rich examples to back it up. The fundamental point is that none of us spend very much time thinking about what to cut from our presentations. I would extend that to say we don’t spend very much time thinking about what data not to share or show. It’s easy to see this as a case for “make the data support what you want it to,” which it is not. At all! Really, it’s more about focussing on showing the data — and only the data — that directly relates to the objectives you are measuring or the hypotheses that you are testing.

Then, focus on presenting that data in a way that makes it clear as to what story it is telling. You do the hard work of interpreting the data. Then, highlight what is coming out of that intepretation. If there is ambiguity, highlight that, too. If there is a clear story, and your audience gets it, and you then introduce an anomaly, you’re much more likely to have a fruitful, engaging discussion about it. You will learn, and your audience will retain!

In the end, this is a riff on a bit of a tangent, I realize. Robert Frank presents some fairly alarming evidence of college professors aiming for broad and deep…and not gaining any better retention than the slide-happy, chart-crazy PowerPoint users provide in the business setting. He goes on to talk about how, in his teaching, he makes a point, repeats it, comes at it from a different angle, makes the students think about it, and then repeats it again. He goes for deep. His students, I’m sure, leave his introductory economics class with a thoroughly embedded (and accurate) understanding of “opportunity cost” (having seen the term mis-applied more than once in my day…and still having to struggle to get to the correct answer…and barely…and barely in time…in his presentation…I applaud that!).

I’m not arguing for simplicity for simplicity’s sake. I’m arguing for going deep, understanding the complexity, and then distilling it down to a narrative, cleanly presented, that leaves your audience with takeaways that are accurate and absorbed.

And…on that note, have any of you read The Economic Naturalist? It sounds like it would be right up my alley. It’s just a bonus that, if I ever actually attended something that could be labeled a “cocktail party,” I could talk about how I’d “read some of Bernanke’s work!”

Hans Rosling and Trendalyzer

It’s been a long, crazy week, and it’s only Tuesday. Geesh! But, I didn’t want to let things slide too long here, so this is just a quickie in case you haven’t seen Trendalyzer (um. yeah. you WANT to click on that link). It’s been around for a few years, and a co-worker actually sent it to me some time back. But, I’d kind of forgotten about it, until it came up…twice…in the past week. First, was an old boss of mine who had just watched Rosling’s TED Talk where he used Trendalyzer. Talk about data visualization done right!

Trendalyzer

I did some snooping around, and found out that Google had bought Trendalyzer. I mentioned this to a couple of web analytics colleagues over lunch one day, and, the next thing I knew, one of them was pointing out that Avinash Kaushik had noted how he works 10 meters away from Rosling at Google (see #6). So, apparently Google bought Rosling along with his technology!

I am admittedly a bit in awe of Rosling. It’s not just that he’s an incredibly sharp, passionate individual. It’s that he actually came up with a way to effectively represent 4 dimensions of data in a meaningful way. By illustrating time…using time…what he does is both simple and powerful.

Sadly, if Trendalyzer hits the business world, I’m sure it will be grossly misused. Never underestimate the ability of a business user to grossly misapply a great technology!

Guy Kawasaki (Almost) Says 3-D Graphs Are Evil

Guy Kawasaki posted Ten Questions with Garr Reynolds (author of Presentation Zen: Simple Ideas on Presentation Design and Delivery). Question number 10 (which, as it turns out, was not the last question, as, in an apparent nod to Douglas Adams, Kawasaki actually included 13 questions): “Why do you think 2-D graphs are better than 3-D graphs?”

Answer: 3D charts and graphs are very popular with consumers, but in almost every case it is preferable to use 2-D graphics to display 2-D data. Charts with 3-D depth and distortion usually make things harder to see, not easier. Some of the precision is lost. There is beauty in the simple display of the data itself, there is no need to decorate with distorted perspectives. If the graphic is just for showing the roughest of general trends, then there is nothing really wrong with a 3-D chart I suppose, but when you are trying to show a true visual representation of the data in the clearest way possible, a simple chart without 3-D adornment is usually better.

<sniffle>Pardon me while I wipe a tear from my eye.

I’ve ranted about 3-D charts and graphs before. And since. And a third time.

It’s not just me!

The only issue I have is with Reynold’s supposition that 3-D charts are okay for showing the roughest of general trends. I’d call that the same as saying it’s okay to unload your shotgun at some quail with a friend (or at least a large donor) within range of the spray of pellets. It’s not okay. It’s just not. Unless you are super-duper qualified (meaning you make a good living as a professional graphic designer or artist), don’t do it!

What happens if you combine 3D WITH fading axes?

I really hope this blog doesn’t become just a continuous series of rants around my pet peeves regarding data visualization. But…I stumbled across the following in a white paper about measuring visitor engagement on a web site.

This chart was generated from Unica’s Affinium NetInsight product. Unica is a great product, from everything I’ve heard. And, they seem to have a sharp acquisition strategy, based on their buying Sane (with NetTracker) several years ago.

But…this chart is pretty awful. It highlights how downright silly a gratuitous third dimension is. But, the “gradient to white” move on the Y-axis was a new one on me. Search engines 1 through 5 all actually have a value of 3. I did manage to figure this out without looking at the data itself in the report…but it was 3 beats more of mental exertion than it should have been.

Vitriolic Rant Redux — 3D Pie Charts

Pie charts are generally bad enough. Mainly, because they take a lot of real estate to provide pretty limited information. But, they do have their place. That place is showing the relative relationship of the parts of a whole when there is no time dimension.

3D pie charts, though, are simply horrid! They actually misrepresent the data and remove whatever instantaneous clarity that a flat pie chart provides.

In the pie chart above, Which product has the greatest portion of the whole?

Product B. That’s not too hard.

Which is greater, Product A or Product D?

Trick question. They’re the same. And, you probably figured that out. But, in order to do so, your brain had to undo the 3D effect, since when it comes to raw area shown, Product A is larger.

When asked a direct question like, “Which is greater, Product A or Product D,” this isn’t too hard to do. But, that’s not usually the approach of interpreting visual displays of data. Rather, the viewer looks at it and says, “What does this chart tell me?” In a 3D pie chart, your brain has to spend extra cycles doing the A vs. D comparison for every wedge in the pie. And it gets pretty hairy when you’ve got, say, 10 or more wedges. What’s happening is your brain has to go through a (subconscious, but real) effort to remove the 3D effect. That’s an effect that somebody else wasted brain cycles and effort on adding in the first place.

This is the sort of inefficiency that process improvement folk salivate over finding in a manufacturing environment: “Person A unwraps a widgetlet and then screws it on to a doohickey and sends it to the next station. Person B then unscrews the widgetlet, inserts a washer, and then screws it back on in the exact same spot.” Obviously, if Person A didn’t screw the widgetlet on in the first place, then the process would have two steps removed: Person A’s screwing on of the widgetlet and Person B’s unscrewing of it.

It’s the same deal with 3D pie charts.