Calculating Trend Indicators
By Tim Wilson on in Data Visualization, Metrics with 8 Comments
Put this down as one of my more tactical posts, brought on by a fit of lingering annoyance with the use (and by “use” I mean “grotesque misuse”) of trend indicators on reports and dashboards. The trouble is that trends are a trickier business than they seem at first blush, and, at the same time, there are a number of quick and easy ways to calculate them…that are all problematic.
With the well-warranted increasing use of sparklines, which are inherently trend-y representations of data, I like to be able to put a meaningful trend indicator that complements the sparkline. Throughout this post, I will illustrate trendlines, but I’m really focussed on trend indicators, which are a symbol that indicates whether the trend in the data is upward, downward, or flat. Although there are a few minor tweaks I’d love to make once Excel 2010 is released and allows the customization of icon sets, I’m reasonably happy with their 5-arrow set of trend indicators:
They’re clean and clear, and they work in both color and in black and white. And, with conditional formatting, they can be automatically updated as new data gets added to a dashboard or report. While I won’t show these indicators again in this post, the trendlines I do show are the behind-the-scenes constructs that would manifest themselves as the appropriate indicator next to a sparkline or numerically reported measure.
I’ll use a simple 12-period data set throughout this post to illustrate some thoughts (not as a sparkline, but the principles all still apply):
Trends are slippery beasts for several reasons:
- Noise, noise, noise — all data is noisy, which means it’s easy to over-read into the data and spot a trend that is not really there
- The aircraft carrier vs. the speedboat conundrum — the more data points you use, the more stable your trend, but the longer it takes to collect enough data to identify a trend, or, worse, to determine if you’ve truly impacted the trend going forward
Let’s start this exploration by walking through some of the common ways that “trend” judgments get made and point out why they’re troubling. I will then show an alternative that, while only marginally more complex to implement, works better when it comes to specifying trend-age.
Trending Approaches of which I’m Leery
Trending Based on the Change Over the Previous Period
The most common way I see trends reported is on a “change since the previous period” basis.
In this example, the trend would be an “up” because the data went up from the prior period to the current period. The problem with this is that, if you look at the longer pattern of data, you see that the data is pretty noisy, and it’s entirely possible that this “trend” is entirely a case of noise masking the true signal.
Trending Over an Extended Period
Another way to trend your data, which Excel makes very simple, is to add a trendline using Excel’s built-in trending capabilities (converting this trendline to an indicator would require some use of a couple of Excel functions that I’ll go into a bit in my recommended approach later in the post).
With this method, the trend would be indicated as “slightly up.” While this may be a valid representation of the overall trend…it seldom seems quite right to use it. The trend gets impacted heavily by any sort of big spikes (or dips) in the data. These keep the same upward or downward trend for a very long period of time. I had a blog post during March Madness one year that wound up driving a big spike in traffic to my site. While it was legitimate for that spike to show an upward trend when I looked at my traffic that week or month, that spike has now wreaked havoc on the macro trend indicator that Google Analytics has shown ever since — for several months that spike kept my overall trend up, and, then, once that spike passed the fulcrum of the tool’s trend calculation, it caused the reporting of a downward trend for severals subsequent months. Through the whole period, I had to mentally discount what the trend indicator showed.
Because seasonality wreaks havoc with trendlines, it’s not uncommon to see trend indicators based on year-over-year results — if the current reporting period is a higher number than the same period a year ago, then the trend is up. For trending purposes, this combines the worst of the two prior examples — it takes a very small number of data points (subjecting the assessment to noise) and it uses ancient history data in the equation.
This isn’t to say that comparisons to the same period in the prior year (or even the same period in the prior quarter, since many companies see an intra-quarter pattern) are bad. But, the question those comparisons answer differs from a trend: a trend should be an indication of “where we are heading of late such that, if we continue on the current course, we can estimate whether we will be doing better or worse next week/next month,” while a year-over-year comparison is more a measure of “did we move positively from where we were last year at this time?”
Trending Approaches I Feel Better About
I’ve spent an embarrassing amount of time thinking about trending over the past four or five years, but I’ve finally settled on an approach that meets all of these criteria:
- It balances the number of data points available for the trend with the sluggishness/timeliness of the results
- It’s reasonably intuitive to explain
- It passes the “sniff test” — while a trend indicator may initially be a little surprising, on closer inspection, the user will realize it’s legit
The last bullet point is really a combination/result of the first two.
My Failed Exploration: Single Point Moving Range (mR)
Because of criteria above, I’ve discarded what I thought was my most promising approach — using the single point moving range (mR). A light bulb went off last spring when I took an intermediate stats class, and, although the professor glossed over the moving range formulas, I thought it was going to be the answer that would allow me to solve my trendline quandary — it would look at the “change over previous period” and determine if that change was sufficiently large to warrant reporting a measurable trend. After noodling with it quite a bit… I don’t think that it works for the purposes of trend indicators. For chuckles, a moving range chart for the example in this post looks like the following:
If you want to read more about moving ranges, the best explanation I found was on the Quality Magazine web site. I’ll just stop there, though. We’ve already lost on the “reasonably intuitive” front, and I haven’t even calculated the control limits yet!
And Another Failed Exploration: the Moving Average
There’s also the “moving average” approach, which smooths things out quite a bit:
I always feel like the moving average is some sort of narcotic applied to the data — it makes things fuzzy by having a single data point factored into multiple points represented on the chart. But, I’ll grudgingly admit that it does have its merits in some cases.
My Approach to Trending (At Last!!!)
There are two key elements to my trending approach, and neither is particularly earth-shattering:
- Break the data into smaller components than the reporting cycle
- Trend only over recent data, rather than over the entire reported timeframe
Going back to the original example here, let’s say that I update a dashboard once a month, and that the dashboard primarily looks at data for the prior 3 months. In that case, the 12 data points each represent (roughly) one week. IF I simply reported the data on a monthly basis, then the chart would look like this:
That shows a clear upward trend, regardless of whether I look at the last month or the last two months of data. It would be hard not to put an upward trend indicator on this plot. But, we’re relying on all of three data points, and we’re going back three full reporting periods to draw that conclusion. Both of these are a bit concerning. Invariably, we’d want to go back farther in time to get more data points to see if this trend was real…and then we’re falling into the aircraft carrier dilemma.
Instead, though, I can keep the granularity of the reporting at a week, but only trend over the last four periods:
I don’t actually plot the trendline shown in the chart above. Rather, I calculate the formula for the line using the SLOPE and INTERCEPT Excel functions. I then calculate the value of the 4-weeks-ago endpoint of the line and the most-recent-week endpoint of the line and look at the percentage change from one to the other. I actually set some named cells in my workbook to specify how many periods I report over (so I can vary from 4 to 6 or something else universally) as well as what the different thresholds are for a strong up, weak up, no change, weak down, or strong down trend.
In the example in this post, the change is a 16% drop, which usually would garner a “strong down” trend — very different from all the upward trends in the early examples! And, even somewhat counter-intuitive, as the most recent change was actually an “up.” If the entire range has been trending upward, as shown by the 3-point plot as well as by a close inspection of the raw basic data (think of it as a sparkline), then you already have that information available as the longer term trend, but, of late, the trend seems to be somewhat downward.
A Note of Caution
This post has gone through what works for me as a general rule. As I read back over it, I realize I’m setting myself up for a case of, “Yeah, you CAN make the data say whatever you want.”
I’m less concerned about prescribing a universally-effective approach to trend calculation as I am about putting out a cautionary tone on the various “obvious” ways to calculate a trend. The sniff test is important — does the trend work for your specific situation when you actually apply it? Or, have you adopted a simplistic, formulaic approach that can actually provide a very clear misrepresentation of the data?
And…a Nod to Efficiency and Automation
The prospect of introducing SLOPE and INTERCEPT functions may seem a little intimidating from a maintenance and updating perspective, but it really doesn’t need to be. By using built-in Excel functionality, these can be set up once and then dynamically updated as new data comes in. I like to build spreadsheets with a data selector so that the dashboard is a poor man’s BI tool that allows exploring how the data has changed over time. The key is to use some of Excel’s most powerful, yet under-adopted, features:
- Conditional formatting — especially in Excel 2007 where conditional formatting can make use of customized icon sets
- Named cells and named ranges — these are handy for establishing constants used throughout the workbook (thresholds, for instance) that you may want to adjust
- Data validation — using a cell as your “date range selector” that references a named range of the column that lists the dates for which you record the data
- VLOOKUP — because you used data validation, you can then use VLOOKUP to find the current data based on what is selected by the user
- Dynamic charts — these actually aren’t a “feature” of Excel so much as the clever combination of several different features; Jon Peltier has an excellent write-up of how to do this
If set up properly, a little investment up front can make for an easily updated report delivery tool…with meaningful trend indicators!