Michael Bales and his associates at Cornell are working on a new visual tool for citations data. This is an area that is ripe for some innovation. There is a lot of data available but it seems difficult to gain insights from them. The prototypical question is how authoritative is a particular researcher or research group, judging from his or her or their publications.

A proxy for "quality" is the number of times the paper is cited by others. More sophisticated metrics take into account the quality of the researchers who cite one's work. There are various summary statistics e.g. h-index that attempts to capture the data distribution but reducing to a single number may remove too much context.

Contextual information is very important for interpretation: certain disciplines might enjoy higher average numbers of citations because researchers tend to list more references, or that papers typically have large numbers of co-authors; individual researchers may have a few influential papers, or a lot of rarely-cited papers or anything in between.

A good tool should be able to address a number of such problems.

Michael was a former student who attended the Data Visualization workshop at NYU (syllabus here), and the class spent some time discussing his citations impact tool. He contacted me to let me know that what we did during the workshop has now reached the research conferences.

Here is a wireframe of the visual form we developed:


This particular chart shows the evolution in citations data over three time periods for a specific sub-field of study. The vertical scale is a percentile ranking based on some standard used in the citations industry. We grouped the data into deciles (and within each deciles, into thirds) to facilitate understanding. The median rank is highlighted - we can see that in this sub-field, the publications have both increased in quantity but also in quality with the median rank showing improvement over the three periods of time. Because "review articles" are interpreted differently by some, those are highlighted in purple.

One of the key strengths of this design is the filter mechanism shown on the right. The citations researcher can customize comparisons. This is really important because the citations data are meaningless by themselves; they only attain meaning when compared to peer groups.

Here is an even rougher sketch of the design:


For a single researcher, this view will list all of his or her papers, ordered by each paper's percentile rank, with review papers given a purple color.

The entire VIVO dashboard project by Weill Cornell Medicine has a github page, but the citation impact tool does not seem to be there at the moment.



Comments are closed.