Free Dataset Repositaries for Data Mining and Visualizations

People in database, datamining, data visualizations and business intelligence require datasets (sets of data) to implement, run and test their algorithms. There are a lot of resources on the internet where you can get synthetic and real datasets for free. Some of the datasets can be benchmark datasets for testing algorithm performance with industry standards. Here I am documenting some of the free resources on the internet that will help you out in your data search for academics and industry needs. I will try to keep this list updated overtime.

P.S. : Before using any of below mentioned datasets please read their respective usage policies.

  • KDD Cup Datasets – This is a very famous knowledge discovery conference that releases data for the researchers and academia.
  • LETOR – This benchmark dataset from Microsoft is used for training, testing and validating your Learning to Rank (used for search engines) algorithm.
  • Yahoo Webscope – Yahoo has provided data here for different needs. Some of the examples are language data, graph and social data, ratings data, advertising and market data and competition data.
  • InfoChimps – Search engine for all your data needs. Consists of a large variety of free and paid datasets.
  • Reddit Opendata – Gives you news about open datasets
  • Google Public Data Explorer – Gives you access to different governmental and public datasets. Lets you visualize these datasets also in different ways. The place where you visualize this data leads you to the official hypertext document where you can download respective data.
  • FIMI repositary – Frequent Itemset Mining Implementation repositary. Most of the datasets here can be used for frequent pattern mining.
  • UCI machine learning repositary – contains databases and database generators contributed by many people overtime. As of today consists of 199 datasets.

You can also checkout KDNuggets for their collection of datasets. Note: not all datasets here are free

Please help me in keeping this list recent by letting me know more free datasets.