Data Sources

6 posts
Database of feathers

There’s a database of feathers called Featherbase, because of course there is: Featherbase is a working group of German feather scientists and other collectors worldwide who came together with their personal collections and created the biggest and most comprehensive online feather library in the world. Using our website, it is possible to identify feathers from hundreds of different species, compare similarities between them, work out gender or age-specific characteristics and...

0 0
Scraping public data ruled legal

For TechCrunch, Zack Whittaker reporting: In its second ruling on Monday, the Ninth Circuit reaffirmed its original decision and found that scraping data that is publicly accessible on the internet is not a violation of the Computer Fraud and Abuse Act, or CFAA, which governs what constitutes computer hacking under U.S. law. The Ninth Circuit’s decision is a major win for archivists, academics, researchers and journalists who use tools to...

0 0
1950 Census released by U.S. National Archives

For privacy reasons, there’s a 72-year restriction on individual Census records, which include names and addresses. It’s 72 years today since the release of the 1950 Census. The scanned paper records are available for browsing and downloading. Tags: archive, census, history

0 0
World Bank’s Gender Data Portal

In an effort to make gender inequalities more obvious, World Bank updated their Gender Data Portal: The World Bank Group has redesigned its Gender Data Portal with these audiences in mind by offering over 900 gender indicators in different formats, ranging from raw data to appealing visualizations and stories. Making sex-disaggregated data easier to analyze, interpret and visualize will bring into focus gender issues that are frequently invisible, including on...

0 0
More detailed data release from Census 2020

After a lot of angst over the past few years around undercount, representation, and anonymization, the Census Bureau released detailed data from the 2020 decennial census: The U.S. Census Bureau today released additional 2020 Census results showing an increase in the population of U.S. metro areas compared to a decade ago. In addition, these once-a-decade results showed the nation’s diversity in how people identify their race and ethnicity. “We are...

0 0
Money-in-politics nonprofits merge their datasets

Center for Responsive Politics and National Institute on Money in Politics are merging their datasets to make it more accessible: The nation’s two leading money-in-politics data organizations have joined forces to help Americans hold their leaders accountable at the federal and state levels, they said today. The combined organization, OpenSecrets, merges the Center for Responsive Politics (CRP) and the National Institute on Money in Politics (NIMP), each leading entities for...

0 0
Mining Parler data

Just before the social network Parler went down, a researcher who goes by the Twitter username @donk_enby scraped 56.7 terabytes of data from the site via a less-than-secure API. Motherboard reports on what some researchers are doing with the data: One technologist took the scraped Parler data, took every file that had GPS coordinates included within it, formatted that information into JSON, and plotted those onto a map. The technologist...

0 0
Data for all of the referee calls in NBA games

Owen Phillips compiled per game and cumulative foul calls for all NBA referees between the 2016-17 and 2019-20 seasons. On its own, I’m not sure it’s that exciting, but if you’re into basketball analytics, it might be fun to tie in with other data. Tags: basketball, Owen Phillips, referee

0 0
Google search trends dataset for Covid-19 symptoms

Google released a search trends dataset earlier this month. Using this dataset, Adam Pearce made an explorer to compare search volume over time: The COVID-19 Search Trends symptoms dataset shows aggregated, anonymized trends in Google searches for more than 400 health symptoms, signs, and conditions, such as cough, fever and difficulty breathing. The dataset provides a time series for each region showing the relative volume of searches for each symptom....

0 0
Friends sitcom transcript dataset

For your analytical perusal, Emil Hvitfeldt provides ten seasons’ worth of scripts from the Friends sitcom in an easy-to-use R package: The goal of friends to provide the complete script transcription of the Friends sitcom. The data originates from the Character Mining repository which includes references to scientific explorations using this data. This package simply provides the data in tibble format instead of json files. The ten seasons ran from...

0 0