Monday, February 24, 2014

Tell a story you believe and can justify.

Though this post is a bit of a follow up from a previous post, I was guilted motivated to write it after a tweet from Roger Peng of JHU and Simply Statistics.



 For context, he is referring to something specific that has absolutely nothing to do with me or this blog. Nevertheless, after blogging a few rants about not being dogmatic (what not to do), it's time to be more constructive.

Tell a story you believe and can justify. I am no expert, but I believe that we humans are programmed to learn and digest information best when it's delivered as a story. In my opinion, this makes sense; like quantitative models, stories can condense a lot of information (data) into a few salient points (a model). Stories may also ascribe causality, giving the audience an ability to understand and perhaps shape similar events (as with a causal model). And if events are similar, perhaps the audience will know what's coming (as with a predictive model).

However stories, like models, can have their pitfalls and can be misleading either through mistakes or malice. Tyler Cowen argues (rightly, IMHO) that we should be cautions when faced with stories. (Full video here.) Stories may be biased, based off of partial information, give the illusion of certainty where there is little to none, conflate correlated events with causal ones, etc. These issues should be very familiar to we students of statistics and data science.

But how do you tell a story if you are suspicious of stories? This is where justifiability comes in to play. If we, as professionals, are aware of the above issues and are ethical (i.e. we don't have an agenda beyond trying to be as objective as a human can), then we must constrain our story so that a reasonable person who understands the issue won't immediately identify our story as problematic. Easy, right?

Also, isn't this blog about statistics and such? Yes, I was just getting to that.

Most of my applied statistical background is in the social sciences and public policy. As such, we are looking for the story, not a p-value or credible interval. The story may be a chart, a table, or literally a story (with all the fun stuff statistics, in an appendix). In fact, too much information can muddy the waters leaving the audience confused. But as I am doing my research, I try to keep one thing in mind: what if someone challenges our story on technical grounds? Can I justify what we've done?* Do I believe the story we're telling?

In practice, then, it's safe to put the chart with the ski slope drop in the employment ratio and truncated axes up front as your bottom line, so long as you can produce other evidence that the drop you're showing is a historically big one. It's safe to put your parsimonious five-variable regression out there as "the" model, when you've got a dozen other "reasonable" models in an appendix backing up your choice (preferably performed on sub samples of the data, or recursively if it's time series).

But if a longer time series or transformation makes that drop relatively minor, or if every model you've tried tells a different story, or if the Bayesian approach smashes your frequentest approach (or the other way around), then your story isn't justifiable yet. Rather than get disheartened, ask yourself, "why are theses things different when I'd expect them to be the same?" Because then you might really be on to something interesting.

*Note that "justify" doesn't necessarily mean "win" in response to a challenge. I mean it as not doing something that obviously should have been done or doing something that obviously should not have been done.

No comments:

Post a Comment