Thursday, January 30, 2014

Lies or contextually relevant reporting?

In a recent blog post titled Lies, Damn Lies. “Data Journalism” and Charts That Don’t Start at 0, the author takes issue with the chart accompanying the below tweet from Heidi N. Moore.

Instead, we are told, this chart should have its y-axis start at zero. And in so doing, we see that the employment ratio is "nowhere near a 'ski jump.'" (The data are also broken out between men and women.)

Really? It is important to provide context when displaying data. In the context of these data, is it realistic that this employment ratio would ever be zero or even near zero? No? Then zero has no business being on the chart's axis and including it is the real distortion here.

But if I were to take issue with both of these charts, my peeve is that levels of a time series are often misleading. Let's, instead, look at the percent change in this ratio from a year earlier.

Whoa! What's up with that historically-low drop starting in 2008? It's almost like "falling off a cliff" or "a ski jump" or whatever hyperbole you choose.

Bottom line: context is important, not arbitrary axis rules. While Heidi Moore's chart was not perfect, it still got the right message across within the context of the story: employment took a nose dive going into the recession. In fact, such a drop is unprecedented over the history of the displayed data.


  1. Using charts to highlight context of any particular issue or joie d'affair is similar to a zoom lens on a Cannon EOS; the tighter the shot the more focus, but less perspective. In acute issues where short-term analyses are prudent the tighter lens may be appropriate, but in chronic issue where long-term analyses are required you want the smoothing effect of zero. Budgets, economic activity, social science research, etc., all benefit from the use of zero. Just my two cents...

    1. I understand the arguments put forward about not plotting misleading graphs. ('t_draw_misleading_graphs) But insisting that zero be on the y-axis is arbitrary. It's context that should dictate the scale of your y-axis (and the range of the x-axis), not an arbitrary value. Consider what zero would mean for the above series, nobody would be employed. That isn't a realistic value that those variables could take. And if I were measure those series as deviations from their means, they'd be centered on zero, but there'd be no new information.