text

1 posts
Vector paths of meaning between words and phrases

Benjamin Schmidt, an assistant professor of history at Northeastern University, explored the space between words and drew the paths to get from one word to another. The above, for example, is the path between Seinfeld and Breaking Bad. Using Google News as the corpus, the steps: Take any two words. I used “duck” and “soup” for my testing. Find a word that is, in cosine distance, between the two words:...

0 0
Playfulness in data visualization

The Newslab project takes aggregate data from Google's various services and finds imaginative ways to enliven the data. The Beautiful in English project makes a strong case for adding playfulness to your data visualization. The data came from Google Translate. The authors look at 10 languages, and the top 10 words users ask to translate from those languages into English. The first chart focuses on the most popular word for...

0 0
Lines, gridlines, reference lines, regression lines, the works

This post is part 2 of an appreciation of the chart project by Google Newslab, advised by Alberto Cairo, on the gender and racial diversity of the newsroom. Part 1 can be read here. In the previous discussion, I left out the following scatter bubble plot. This plot is available in two versions, one for gender and one for race. The key question being asked is whether the leadership in...

0 0
A look at how the New York Times readers look at the others

The above chart, when it was unveiled at the end of November last year, got some mileage on my Twitter feed so it got some attention. A reader, Eric N., didn't like it at all, and I think he has a point. Here are several debatable design decisions. The chart uses an inverted axis. A tax cut (negative growth) is shown on the right while a tax increase is shown...

0 0
Wheel of fortune without prizes: the negative report about negativity

My friend, Louis V., handed me a report from Harvard's Shorenstein Center, with the promise that I can make a blog post or two from it. And I wasn't disappointed. This report (link) caught some attention a few months ago because of the click-bait headline that the media is "biased" against Trump in his first 100 days. They used the most naive definition of "bias". The metric is the amount...

0 0
Subreddit math with r/The_Donald helps show topic breakdowns

Trevor Martin for FiveThirtyEight used latent semantic analysis to do math with subreddits, specifically r/The_Donald. We’ve adapted a technique that’s used in machine learning research — called latent semantic analysis — to characterize 50,323 active subreddits based on 1.4 billion comments posted from Jan. 1, 2015, to Dec. 31, 2016, in a way that allows us to quantify how similar in essence one subreddit is to another. At its heart,...

0 0
Lines that delight, lines that blight

This WSJ graphic caught my eye. The accompanying article is here. The article (judging from the sub-header) makes two separate points, one about the total amount of money raised in IPOs in a year, and the change in market value of those newly-public companies one year from the IPO date. The first metric is shown by the size of the bubbles while the second metric is displayed as distances from...

0 0
Sorting out the data, and creating the head-shake manual

Yesterday's post attracted a few good comments. Several readers don't like the data used in the NAEP score chart. The authors labeled the metric "gain in NAEP scale scores" which I interpreted to be "gain scores," a popular way of evaluating educational outcomes. A gain score is the change in test score between (typically consecutive) years. I also interpreted the label "2000-2009" as the average of eight gain scores, in...

0 0
How will the Times show election results next week? Will they give us a cliffhanger?

I don't know for sure how the New York Times will present election results next week; it's going to be as hard to predict as the outcome of the election! The Times just published a wonderful article describing all the different ways election results have been displayed in the past. tldr; The designer has to make hard choices. Some graphics are better at one thing but worse at another. If...

0 0
Confusion is not limited to complex dataviz

This chart looks simple and harmless but I find it disarming. I usually love the cheeky titles in the Economist but this title is very destructive to the data visualization. The chart has nothing to do with credit scores. In fact, credit scoring is associated with consumers while countries have credit ratings. Also, I am not a fan of the Economist way of labeling negative axes. The negative sign situated...

0 0