Data Sources

2 posts
Google search trends dataset for Covid-19 symptoms

Google released a search trends dataset earlier this month. Using this dataset, Adam Pearce made an explorer to compare search volume over time: The COVID-19 Search Trends symptoms dataset shows aggregated, anonymized trends in Google searches for more than 400 health symptoms, signs, and conditions, such as cough, fever and difficulty breathing. The dataset provides a time series for each region showing the relative volume of searches for each symptom....

0 0
Friends sitcom transcript dataset

For your analytical perusal, Emil Hvitfeldt provides ten seasons’ worth of scripts from the Friends sitcom in an easy-to-use R package: The goal of friends to provide the complete script transcription of the Friends sitcom. The data originates from the Character Mining repository which includes references to scientific explorations using this data. This package simply provides the data in tibble format instead of json files. The ten seasons ran from...

0 0
Data on loans issued through the Paycheck Protection Program

The Paycheck Protection Program was established to provide aid to small businesses. It’s a $669-billion loan program. The data for 4.8 million loans, amounting to $521 billion so far, is now available from the Small Business Administration. For loans less than $150,000, you can download data for all states individually. Data for loans that were more than $150,000 can be downloaded as a single file. Look up business name, type,...

0 0
What the federal government has been buying and where from

The Federal Procurement Data System tracks federal contracts of $10,000 or more. For ProPublica, Moiz Syed and Derek Willis made the data for coronavirus-related contracts more accessible with a searchable database. Browse the items, the companies, and the amounts. Somehow it seems like so much, and yet so not enough. See also the accompanying article highlighting some of the more questionable contracts. Tags: coronavirus, procurement, ProPublica

0 1
Coronavirus data at the state and county level, from The New York Times

Comprehensive national data on Covid-19 has been hard to come by through government agencies. The New York Times released their own dataset and will be updating regularly: The tracking effort grew from a handful of Times correspondents to a large team of journalists that includes experts in data and graphics, staff news assistants and freelance reporters, as well as journalism students from Northwestern University, the University of Missouri and the...

0 0
Restaurant struggles

The restaurant industry is taking a big hit right now, as most people are staying put at home. OpenTable provides a downloadable dataset to show how much restaurant dining is down: This data shows year-over-year seated diners at restaurants on the OpenTable network across all channels: online reservations, phone reservations, and walk-ins. For year-over-year comparisons by day, we compare to the same day of the week from the same week...

0 0
Nationwide database of credibly accused Catholic clergy

For ProPublica, Ellis Simani and Ken Schwencke compiled an interactive database that you can search: ProPublica reporters spent months collecting the lists as they were originally released by each diocese. They then made them searchable via a public database in order to provide victims of clerical abuse and members of the public a way to search across all of the released lists. More than 6,700 names are included in the...

0 0
Dataset for rejected license plate applications

Noah Veltman just posted a dataset of 23,463 personalized license plate applications that were flagged for additional review by the state of California from 2015 to 2016. Casually scrolling through, for the plates people request and why they are flagged, this is a goldmine of amusement. Veltman writes: This data was parsed from a set of 458 Excel workbooks that the DMV prepared for someone else’s public records request. I...

0 0
Google Dataset Search moves out of beta

Over a year ago, Google released Dataset Search in public beta. The goal was to index datasets across the internets to make them easier to find. It came out of beta: Based on what we’ve learned from the early adopters of Dataset Search, we’ve added new features. You can now filter the results based on the types of dataset that you want (e.g., tables, images, text), or whether the dataset...

0 0
Scripts from The Office, the dataset

The decade is almost done. You’re sitting there and you’re thinking: “I wish I could easily access the scripts from all seasons of The Office so that I could analyze the dialogue and relationships between characters.” Well, your wish is granted. Bradley Lindblad stuck all the scripts in an R package. It’s called schrute. Take that, 2019. Tags: R, scripts, The Office

0 0