In my just-published Long Read article at DataJournalism.com, I touched upon the subject of "How to Read this Chart".
Most data graphics do not come with directions of use because dataviz designers follow certain conventions. We do not need to tell you, for example, that time runs left to right on the horizontal axis (substitute right to left for those living in right-to-left countries). It's when we deviate from the norms that calls for a "How to Read this Chart" box.
A discussion over Twitter during the weekend on the following New York Times chart perfectly illustrates this issue. (The article is well worth reading to educate oneself on this red-hot public-health issue. I made some comments on the sister blog about the data a few days ago.)
Reading this chart, I quickly grasp that the horizontal axis is the speed of infection and the vertical axis represents the deadliness. Without being told, I used the axis labels (and some of you might notice the annotations with the arrows on the top right.) But most people will likely miss - at a glance - that the vertical axis utilizes a log scale while the horizontal axis is linear (regular).
The effect of a log scale is to pull the large numbers toward the average while spreading the smaller numbers apart - when compared to a linear scale. So when we look at the top of the coronavirus box, it appears that this virus could be as deadly as SARS.
The height of the pink box is 3.9, while the gap between the top edge of the box and the SARS dot is 6. Yet our eyes tell us the top edge is closer to the SARS dot than it is to the bottom edge!
There is nothing inaccurate about this chart - the log scale introduces such distortion. The designer has to make a choice.
Indeed, there were two camps on Twitter, arguing for and against the log scale.
I use log scales a lot in analyzing data, but tend not to use log scales in a graph. It's almost a given that using the log scale requires a "How to Read this Chart" message. And the NY Times crew delivers!
Right below the chart is a paragraph:
To make this even more interesting, the horizontal axis is a hidden "log" scale. That's because infections spread exponentially. Even though the scale is not labeled "log", think as if the large values have been pulled toward the middle.
Here is an over-simplified way to see this. A disease that spreads at a rate of fifteen people at a time is not 3 times worse than one that spreads five at a time. In the former case, the first sick person transmits it to 15, and then each of the 15 transmits the flu to 15 others, thus after two steps, 241 people have been infected (225 + 15 + 1). In latter case, it's 5x5 + 5 + 1 = 31 infections after two steps. So at this point, the number of infected is already 8 times worse, not 3 times. And the gap keeps widening with each step.