Data Sources

31 posts
Datasets for teaching data science

Rafael Irizarry introduces the dslabs package for real-life datasets to teach data science: [I] try to avoid using widely used toy examples, such as the mtcars dataset, when I teach data science. However, my experience has been that finding examples that are both realistic, interesting, and appropriate for beginners is not easy. After a few years of teaching I have collected a few datasets that I think fit this criteria....

0 0
FiveThirtyEight datasets available for download

If you’re looking for some data to play with, FiveThirtyEight just made it easier to download their data and code. They’ve been on GitHub, I think from the beginning, but this data page is even more straightforward and to the point. Tags: FiveThirtyEight

0 0
World population estimator and gridded data from NASA

Population data typically comes in the context of boundaries. City data. County data. Country data. With their Population Estimate Service, NASA provides data at higher granularity. You can request estimated population in the context of a world grid. Here’s an interactive map to demonstrate the API. Click and drag a shape across any region in the world and get an estimate of the population within that shape. [via kottke] Tags:...

0 0
Data to identify Wikipedia rabbit holes

New data dump from the Wikimedia Foundation: The Wikimedia Foundation’s Analytics team is releasing a monthly clickstream dataset. The dataset represents—in aggregate—how readers reach a Wikipedia article and navigate to the next. Previously published as a static release, this dataset is now available as a series of monthly data dumps for English, Russian, German, Spanish, and Japanese Wikipedias. Tags: Wikipedia

0 0
Download comprehensive police shootings data

Data for police shootings is usually the subset that only includes fatalities. Vice News made requests nationwide to get data on people who were shot but not killed by police. To accompany their story, Vice News made the data and code available for download: Ultimately, we obtained some data from 47 departments — with 4,099 incidents in all. Departments in New York’s Suffolk and Nassau Counties didn’t provide us with...

0 0
Serial-Killer detector

Alec Wilkinson, reporting for The New Yorker, profiled Thomas Hargrove, who is deep into finding serial killers algorithmically and through public data: Thomas Hargrove is a homicide archivist. For the past seven years, he has been collecting municipal records of murders, and he now has the largest catalogue of killings in the country—751,785 murders carried out since 1976, which is roughly twenty-seven thousand more than appear in F.B.I. files. States...

0 0
Searchable budget proposal and the 10-year change

The administration released a budget proposal yesterday, which as you’d expect contains some big shifts. The New York Times calculated “the changes over 10 years, compared with projected spending under current law” and made the numbers available in a searchable table. Tags: budget, government

0 0
Scrabble data and analysis

Looking for some data to play with? James P. Curley compiled Scrabble data using computer-played games in Quackle Scrabble. Check out his summary analysis or grab the data for yourself in the R package scrabblr. Tags: R, Scrabble

0 0
Easily download large-ish survey datasets

Many government organizations release microdata for surveys every year. It comes as anonymized responses from each survey participant for each question in said survey. However, those who want to use this data often run into the challenge of downloading and parsing. It’s rarely straightforward. So, Anthony Damico provides a big helping of R scripts to easily download data from a bunch of surveys. He calls the site Analyze Survey Data...

0 0
Newspaper endorsements since 1980

Noah Veltman put together a history of newspapers’ presidential endorsements since 1980 for about 100 publications. There’s a simple table showing Republican, Democrat, or other endorsement over the years, and you can download the data too. Tags: election, endorsement, newspaper

0 0