The Problem With "Data-Storytelling"

So much of the data science community focuses on “data storytelling.” We tell ourselves we’re weaving threads of data into beautiful tapestries, then spinning these yarns to audiences on the edge of their seats, hungry for just another helping of data-driven insight.

But the truth is most practical data science “storytelling” is more like reporting. You’re collecting information, synthesizing it, and sending it back out edited and focused for your audience. In some cases these presentations can be investigative exposés, but in many cases they’re more akin to sportswriting: this is what happened (or will likely happen) and some interesting stats. There’s no drama, no characters, no flowery details. In most cases you’re presenting to a business team who wants your insights, quickly, and nothing more.

As a former editor building a career in data science, and a professional who has already sat through a few too many eye-glazing presentations, I want to share some tips on communicating your ideas quickly and efficiently.

1. An Eighth-Grade Data Level

Yes, this is the classic “know your audience” bit. Journalists in the U.S. are generally taught to write at an eighth-grade reading level. For data we must do the same, scaling our technical jargon down to a reasonable base level.

Just as journalists assume the average person knows a couple of ten-dollar words, a data scientist can generally expect that their audience has a decent grounding in the simplest data terms. They will likely have heard of machine learning and linear regression, but perhaps not be familiar with ARIMA or Markov chain Monte Carlo algorithms.

This is a general rule, however, and will vary by audience. If you are presenting to or expecting to be read by other data professionals, you can of course load up your work with the juicy details and fancy packages you used. Oppositely, if your audience’s data literacy is on the lower end, you can knock your explanations down another level or two to match. If in doubt, go with the lower level; you can always explain in greater detail later on.

2. The Inverted Pyramid

News stories are structured to get information across quickly. You don’t need to read the entire article to know what the stock market dropped 200 points, but you can read on to learn why and how. Your work should be structured the same way.

When relaying information we often mirror our own thought processes: I thought this, then this, then that, and now I’ve arrived at this. This is ineffective and often to follow. In journalism we call it “burying the lede.” By sharing the most important details first, your audience is better able to contextualize the supporting information that comes after. Who, what, where, when, why, and how?

In data science this will typically just be the “what” and “why.” What have you found or what are you recommending, what is the impact of that conclusion, and a brief description of how you arrived at it. After that you can delve deeper into the “how” as your audience’s technical level and your time allow.

Details should flow from most important to least, in a sort of inverted pyramid of importance, with the very least important information you want to include at the end.

3. Trim the Fat

“Be sincere, be brief, be seated.”
– Franklin D. Roosevelt

“Brevity is the soul of wit.”
– Shakespeare

The world is full of colorful expressions that all mean the same thing: keep it short and simple. Your inverted pyramid does not have to be the Great Pyramid.

In journalism this means excluding anything that doesn’t add to a reader’s understanding of the story. The reader doesn’t need to know that it was a cold day in the capitol unless someone slipped on the ice. In data science the same holds true, not just for written or spoken details but for visuals as well.

Exclude anything – any tools, any techniques, any data points – that is not strictly necessary to making your point. For visualizations, this means trimming your timeframe to only the relevant dates, and cleaning out unnecessary labels, dimensions, or annotations. It means making your charts as simple as they can possibly be while still telling your story accurately. If you can, limit the bulk of your argument to one or two perfect, highly impactful visualizations.

I’ll take my own advice and close out here. Next time you’re relaying what you’ve found through your data science wizardry, remember the rules above. Your audience will thank you.

11 Likes

I whole-heartedly agree.
A professor of mine used to drill us to “first state the theorem, then give the proof”. I’ve noticed how counter-intuitive this is for nearly everyone–as you said, we tend to recreate our thought process for the audience, when in reality they want to start at the end and then go back to how we reached the conclusion.
Another analogy: imagine how much more comfortable it is to receive driving directions when you know the destination.

2 Likes

Great article. A press officer I used to work with used to say: “First answer the question, then tell me why that’s your answer.” Work on the basis that most people aren’t actually interested in the thought process that got you to the answer, but you need to have it for the ones that do want to keep reading. It might just need to be in an annex to the report, a methods note or somewhere else which is transparent, accessible but not cluttering up the main story.

3 Likes

Note: "Be sincere
A very practical and useful article. I will share it with my students!
Thank you.

2 Likes

Thanks for the spot – it has been fixed.

I hope your students find it useful!

1 Like

This is really true, thank you for explaining; I had fallen in this trap, when I was showing how we have change this and where we had reached. But it went totally different and audience felt bored :yawning_face:. Later then I got know only if I would have went in opposite direction with less detail but most important points it would have become easier for audience to grasp.

Thank you for sharing in detail @Brettnroberts