Category Archives: Database

Top 3 NoSQL Databases Features

A NoSQL database provides an effective mechanism for storage and data retrieval. These databases excel both in terms of speed and volume. Moreover, it is increasingly been considered a viable alternative to relational databases. While working with a huge quantity of data, NoSQL is very useful. Let us discuss some of the top NoSQL databases and their performance in data management systems:

NoSql Databases

Apache Cassandra

Apache Cassandra has been primarily designed to handle huge amounts of data spread out across commodity servers. It was developed by Facebook in 2008 as an open source distributed database management system. It helped Facebook to power up their inbox search feature. It offers the mixture of a column-oriented database and the columns are grouped into families. It provides a highly available service with its massive amount of consistency and replication. Apache Cassandra is preferred by companies as a back-end database. The most amazing features include BigTable modeling and the Gossip protocol. Some of the strong points favouring Apache Cassandra are highly available with no single point of failure, NoSQL column family implementation, flexible scheme, SQL like query language, support search through secondary indexes and support for replication. Apache Cassandra is a perfect solution and it provides scalability & performance needed in most applications. Continue reading

Free Dataset Repositaries for Data Mining and Visualizations

People in database, datamining, data visualizations and business intelligence require datasets (sets of data) to implement, run and test their algorithms. There are a lot of resources on the internet where you can get synthetic and real datasets for free. Some of the datasets can be benchmark datasets for testing algorithm performance with industry standards. Here I am documenting some of the free resources on the internet that will help you out in your data search for academics and industry needs. I will try to keep this list updated overtime.

P.S. : Before using any of below mentioned datasets please read their respective usage policies.

  • KDD Cup Datasets – This is a very famous knowledge discovery conference that releases data for the researchers and academia.
  • LETOR – This benchmark dataset from Microsoft is used for training, testing and validating your Learning to Rank (used for search engines) algorithm. Continue reading