Data Visualization — March Madness Style

By on in , with 7 Comments

I got an e-mail last week just a few hours into Round 1 of this year’s NCAA men’s basketball tournament. The subject of the email was simply “dumb graph,” and the key line in the note was:

The “game flow” graph…how in the WORLD is that telling me anything? That the score goes up as the game goes on? Really? Ya think?

My friend was referring to the diagrams that ESPN.com is providing for every game in the tournament. The concept of these graphs is pretty simple: plot the score for each team over the course of the game. For instance, the “Game Flow” graph for the Oklahoma vs. Morgan State game looks like this (you can see the actual graph on the game recap page — just scroll down a bit and it’s on the right):

Oklahoma vs. Morgan State

This isn’t an exact replication, but it’s pretty close — best I could manage in Excel 2007 (the raw data is courtesy of the ESPN.com play-by-play page  for the game). ESPN’s graph is a Flash-based chart, so it’s got some interactivity that the image above does not (we’ll get to that in a bit).

The graph shows that the game was tight for the first 4-5 minutes, then Oklahoma pulled away, Morgan State made it really close mid-way through the first half, and then Oklahoma pulled away and never looked back. My friend had a point, though —  the dominant feature of the graph is that both lines trend up and to the right…and any chart of a basketball game is going to exhibit that pattern (actually, the play-by-play for that game has a couple of hiccups such that, when I originally pulled the data, I had a couple places where the score went down due to out-of-sequence free throw placement…but I noticed the issue and fixed it). In business, we’re pretty well conditioned to see “up and to the right” as a good thing…but it’s meaningless in the case of a basketball game.

Compare that graph to a game that was much closer — the Clemson vs. Michigan game (the graph on ESPN’s site is on the recap page, and the raw data is on the play-by-play page):

Clemson vs. Michigan

This was a tighter game all through the first half. Clemson led for the first 7-8 minutes, Michigan pulled substantially ahead early in the second half, and then things got tight in the last few minutes of the game. But, again, both lines moved up and to the right.

These charts are not difficult to interpret:

  • The line on top is the team that is leading
  • The distance between the lines is the size of the lead
  • The lines crossing signifies a lead change

But, could we do better? Well, my wife and kids are out-of-town for the week (spring break), I have the social life you’d expect from someone who blogs about data and data visualization, and the fridge is well-stocked with beer. Party. ON!

At best, my level of basketball fan-ness hovers right around “casual.” Still, I follow it enough to know the key factors of a game update or game upset (Think: “Hey, Joe. What’s the score?”). Basically:

  • Who’s winning?
  • By how much?

(If there’s time for a third data point, the actual score is an indication of whether it’s a high scoring shootout or a low scoring defense-oriented game.)

Given these two factors as the key measures of a game, take another look at the graphs above. When the game is tight, you have to look closely to assess who is winning. And, determining how much they’re winning by requires some mental exertion (try it yourself: look back at the last graph and ask yourself how much Michigan was winning by halfway through the second half).

This is just begging for a Stephen Few-style exercise to see if I can do better.

First, the Oklahoma/Morgan State game:

Oklahoma vs. Morgan State 

Rather than plotting both team’s scores, with the total score on the Y-axis, this chart plots a single line with the size of the lead — whichever side of the “0” line the plot is on is the team that is winning. The team on the top is the higher seed, and the team on the bottom is the lower seed. I added the actual score at halftime and the end of the game, as well as each team’s seed. Compare that chart to the much closer Clemson/Michigan game:

Clemson vs. Michigan

The chart looks very different — focussing on what information fans really want and presenting it directly, rather than presenting the data in a way that requires mental exertion to derive what the fan is really interested in: who’s winning and by how much? While the graphs on ESPN’s site allow you to mouse over any point in the game and see the exact score and the exact amount of time remaining, it’s hard to imagine who would actually care to do that — better to come up with an information-rich and easy-to-interpret static chart than to get fancy with unnecessary interactivity.

A few other subtle changes to the alternative representation:

  • I tried to dramatically increase the “data-pixel ratio” (Few’s principle that the ratio of actual data to decoration should be maximized) — this is a little unfair to ESPN, as their site is working with an overall style and palette for the site, but it’s still worth keeping in mind
  • I used color on the Y-axis to show which team’s lead is above/below the mid-line. The numbers below the middle horizontal line are actually negative numbers, but with a little Excel trickery, I was able to remove the “-” and change the color of the labels (all done through Custom number formatting)
  • By putting the top seed on the top, looking at a full page of these charts would quickly highlight the games that were upsets

I’m my own worst critic, so here are two things I don’t like about the alternate charts above:

  • The overall palette still feels a little clunky — the main data plot doesn’t seem to “pop” as much as it should, even though it’s black, and the shaded heading doesn’t feel right
  • While the interpretation of the data requires less mental effort once you understand what the chart is showing, it does seem like this approach requires another half-second of interpretation upr front that the original charts don’t require

What do you think? What else could I try to improve the representation?

Similar Posts:

7 Comments


  1. Tim –

    I like your representations better than the original. I’ve seen similar charts showing a team’s accumulated win-loss differential over a season. You don’t need to show all wins and all losses, just like you don’t need to show all points for and against.

    The thing I might change is to put a vertical line segment at each scoring event, rather than a diagonal line that slopes to it (compulsive? moi?), giving a sharper stepped appearance.

  2. Thanks, Jon!

    I initially broke the game down into 2,400 seconds and built the chart that way, which did give them, for all intents and purposes, the stairstepped look. I got to thinking that was overkill, so backed off to 15-second increments, which introduced the sloped lines. You make a good point, though.

  3. Pingback Interesting Links for 3-April-2009 | PTS Blog

  4. What about using a column chart with 0 gap. This would make the steps more obvious and you could also set the colour of the column to the colour of the team leading.

  5. I wonder if plotting it as a bar chart, with no gap, would give the stair-step effect without having to plot every 2,400 seconds…

    I definitely like the improvement and I’m wondering what an entire bracket would look like plotting out the entire tournament, using smaller versions of the chart….hmm…

  6. I like @Roger and @Michael’s suggestions. I gave it a quick run in Excel 2007, but only half got the “different color based on which team is winning” to work — there is an option to “invert negative values,” but that didn’t really get me to what I was looking for. I’m now thinking this will be my excuse to fiddle around some more with Google Charts to see if I can get the added control there. May not be until next March!

  7. The easiest way to achieve 2 different colours is to use a stacked column with 2 series, where each series has a value of 0 or its lead.

Leave your Comment


« »