In reviewing a new small-scale study of the Moderna vaccine, I found this chart:

This style of charts is quite common in scientific papers. And they are horrible. It irks me to think that some authors are forced to adopt such styles.

The study's main goal is to compare two half doses to two full doses of the Moderna vaccine. (To understand the science, read the post on my book blog.) The participants were stratified by age group. The vaccine is expected to work better for younger people than for older people. The point of the study isn't to measure the difference by age group, and so the age-group dimension is secondary.

Upon recognizing that, I reduce the number of colors from 4 to 2:

Halving the number of colors presents no additional difficulty. The reader spends less time cross-referencing.

The existence of the Pbo (placebo) and Conv (convalescent plasma) columns on the sides is both unsightly and suboptimal. The "Conv" serves as a reference level for the amount of antibodies the vaccine stimulates in people. A better way to display reference levels is using reference lines.

The biggest problem with the chart is the log scale on the vertical axis. This isn't even a log-10 but a log-2. (Each tick is a doubling of value.)

Take the first set of columns as an example. The second column is clearly less than twice the height of the first column, and yet 25 is 3.5 times bigger than 7.  The third column is also visually less than double the size of the second column, and yet 189 is 7.5 times bigger than 25. The areas (heights) of the columns do not convey the right information about relative sizes of the underlying data.

Here's an amusing observation. The brown area shaded below is half of the entire area of the chart - if we reverted it to a linear scale. And yet there is not a single data point above 250 in the data so the brown area is entirely empty.

An effect of a log scale is to compress the larger values of a dataset. That's what you're seeing here.

I now revisualize using dotplots:

The version on the left retains the log scale while the right one (pun intended) reverts to the linear scale.

The biggest effect by far is the spike of antibodies between day 29 and 43 - which is after the second shot is administered. (For Moderna, the second shot is targeted for day 28.) In fact, it is during that window that the level of antibodies went from below the "conv" level (i.e. from natural infection) to far above.

The log-scale version buries this finding because it squeezes the large numbers on the chart. In addition, it artificially pulls the small numbers toward the "Conv" level. On the right chart, the second dot for 18-54, full doses is only at half the level of "Conv"  but it looks tantalizing close to the "Conv" level on the left chart.

The authors of the study also claim that there is negligible dropoff by 30 days after the second dose, i.e. between the third and fourth dots in each set. That may be so on the log-scale chart but on the linear chart, we see a moderate reduction. I don't believe the size of this study allows us to make a stronger conclusion but the claim of no dropoff is dubious.

The left chart also obscures the age-group differences. It appears as if all four sets show roughly the same pattern. With the linear scale, we notice that the vaccine clearly works better for the younger subgroup. As I discussed on the book blog, no one actually knows what level of antibodies constitutes "protection," and so I can't say whether that age-group difference has practical significance.

***

I recommend using log scales sparingly and carefully. They are a source of much mischief and misadventure.

Tags:
junkcharts