Data Sources

87 posts
Data on loans issued through the Paycheck Protection Program

The Paycheck Protection Program was established to provide aid to small businesses. It’s a $669-billion loan program. The data for 4.8 million loans, amounting to $521 billion so far, is now available from the Small Business Administration. For loans less than $150,000, you can download data for all states individually. Data for loans that were more than $150,000 can be downloaded as a single file. Look up business name, type,...

0 0
What the federal government has been buying and where from

The Federal Procurement Data System tracks federal contracts of $10,000 or more. For ProPublica, Moiz Syed and Derek Willis made the data for coronavirus-related contracts more accessible with a searchable database. Browse the items, the companies, and the amounts. Somehow it seems like so much, and yet so not enough. See also the accompanying article highlighting some of the more questionable contracts. Tags: coronavirus, procurement, ProPublica

0 1
Coronavirus data at the state and county level, from The New York Times

Comprehensive national data on Covid-19 has been hard to come by through government agencies. The New York Times released their own dataset and will be updating regularly: The tracking effort grew from a handful of Times correspondents to a large team of journalists that includes experts in data and graphics, staff news assistants and freelance reporters, as well as journalism students from Northwestern University, the University of Missouri and the...

0 0
Restaurant struggles

The restaurant industry is taking a big hit right now, as most people are staying put at home. OpenTable provides a downloadable dataset to show how much restaurant dining is down: This data shows year-over-year seated diners at restaurants on the OpenTable network across all channels: online reservations, phone reservations, and walk-ins. For year-over-year comparisons by day, we compare to the same day of the week from the same week...

0 0
Nationwide database of credibly accused Catholic clergy

For ProPublica, Ellis Simani and Ken Schwencke compiled an interactive database that you can search: ProPublica reporters spent months collecting the lists as they were originally released by each diocese. They then made them searchable via a public database in order to provide victims of clerical abuse and members of the public a way to search across all of the released lists. More than 6,700 names are included in the...

0 0
Dataset for rejected license plate applications

Noah Veltman just posted a dataset of 23,463 personalized license plate applications that were flagged for additional review by the state of California from 2015 to 2016. Casually scrolling through, for the plates people request and why they are flagged, this is a goldmine of amusement. Veltman writes: This data was parsed from a set of 458 Excel workbooks that the DMV prepared for someone else’s public records request. I...

0 0
Google Dataset Search moves out of beta

Over a year ago, Google released Dataset Search in public beta. The goal was to index datasets across the internets to make them easier to find. It came out of beta: Based on what we’ve learned from the early adopters of Dataset Search, we’ve added new features. You can now filter the results based on the types of dataset that you want (e.g., tables, images, text), or whether the dataset...

0 0
Scripts from The Office, the dataset

The decade is almost done. You’re sitting there and you’re thinking: “I wish I could easily access the scripts from all seasons of The Office so that I could analyze the dialogue and relationships between characters.” Well, your wish is granted. Bradley Lindblad stuck all the scripts in an R package. It’s called schrute. Take that, 2019. Tags: R, scripts, The Office

0 0
Deaths from child abuse, a starting dataset

By way of the Child Abuse Prevention and Treatment Act, ProPublica and The Boston Globe requested records from each state. They compiled the many documents into a single dataset: In each record, CAPTA requires states to list the age and gender of the child, and information about a household’s prior contact with welfare services. The information is supposed to help government agencies prevent child abuse, neglect and death, but reporting...

0 0
Sephora dataset is a collection of makeup reviews that mention crying

Interested in reviews on the Sephora website for waterproof makeup, Connie Ye figured she might as well scrape all of the reviews and filter for the ones that mention crying: I ended up scraping about ~5k reviews, and 105 of them mentioned crying, sobbing or tears, giving a ratio of about 1/50. This is of course a biased number because the products the reviews are for are meant to withstand...

0 0