Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Data Platforms
https://github.com/quiltdata/quilt
https://registry.opendata.aws/
https://www.google.com/publicdata/directory
https://proxycrawl.com
Data for Computer Vison
Data for NLP
Data for Learning to Rank
Financial Data
Curated Datasets @ Observable
Education
Educational Demographic and Geographic Estimates
Misc
https://divvy-tripdata.s3.amazonaws.com/index.html
Data Sources
The US National Center
Education Statistics Data on educational institutions and education demographics from the US and around the world.
The UK Data Centre
The UK’s largest collection of social, economic and population data.
FiveThirtyEight
A large number of polls providing data on public opinion of political and sporting issues.
FBI Uniform Crime Reporting
The FBI is responsible for compiling and publishing national crime statistics, with free data available at national, state and county level.
Bureau of Justice
Here you can find data on law enforcement agencies, jails, parole and probation agencies and courts.
Qlick Data Market
Offers a free package with access to datasets covering world population, currencies, development indicators and weather data.
NASA Exoplanet Archive
Public datasets covering planets and stars gathered by NASA’s space exploration missions.
UN Comtrade Database
Statistics compiled and published by the United Nations on international trade. Includes Comtrade Lab which is a showcase of how cutting edge analytics and tools are used to extract value from the data.
Financial Times Market Data
Up to date information on financial markets from around the world, including stock price indexes, commodities and foreign exchange.
Google Trends
Examine and analyze data on internet search activity and trending news stories around the world.
The advantage Twitter has over the others are that most conversations are public. This means that huge amounts of data is available through their API on who is talking about what, where, when and why.
Google Scholar
Entire texts of academic papers, journals, books and legal case law.
As with Twitter, Instagram posts and conversations are public by default. Their APIs allow likes, mentions and business details to be analyzed.
OpenCorporates
The world’s largest open database of companies.
Glassdoor API
Information about job vacancies, candidates, salaries and employee satisfaction is available through their developer API.
IMDB Datasets
Datasets in a number of formats drawn from the web’s largest resource on movies, television and people working in those industries.
OpenLibrary
Data Dumps Datasets on books including catalogs from libraries around the world
Labelled Faces in the Wild
13,000 collated and labeled images of human faces, for use in developing applications involving facial recognition.
Microsoft Marco
Microsoft’s open machine learning datasets for training systems in reading comprehension and question answering.
Machine Learning Dataset Repository
Collection of open datasets contributed by data scientists involved in machine learning projects.
eBay Market Data
Insights Data on millions of online sales and auctions from eBay
Natural History Museum Data Portal
Information on nearly 4 million historical specimens in the London museum’s collection, as well as scientific sound recordings of the natural world.
CERN Open Data
More than one petabyte of data from particle physics experiments carried out by CERN.
One Million Audio Cover Images
Dataset hosted at archive.org covering music released around the world, for use in image processing research
Complete Public Reddit Comments Corpus
Over one billion public comments posted to Reddit between 2007 and 2015, for training language algorithms
Microsoft Azure Data Markets Free Datasets
Freely available datasets covering everything from agriculture to weather
Irish Electric Vehicle Charge Point Status
Collates data from the body which oversees the network of EV charge points across the Republic of Ireland and Northern Ireland.
LondonAir
Pollution and air quality data from across London
References
https://www.forbes.com/sites/bernardmarr/2018/02/26/big-data-and-ai-30-amazing-and-free-public-data-sources-for-2018/#116850d65f8a
https://www.forbes.com/sites/bernardmarr/2016/02/12/big-data-35-brilliant-and-free-data-sources-for-2016/#1cd2291ab54d
https://www.columnfivemedia.com/100-best-free-data-sources-infographic
https://infogram.com/blog/free-data-sources/