Tuple MapReduce: beyond the classic MapReduce. It’s been some years now since Google wrote the paper [“MapReduce: Simplified Data Processing on Large Clusters“] in 2004.
In this paper Google presented MapReduce, a programming model and associated implementation for solving parallel computation problems with big-scale data. This model is based on the use of the functional primitives “map” and “reduce” present in LISP and other functional languages. Today, Hadoop, the “de facto” open-source implementation of MapReduce, is used by a wide variety of companies, institutions and universities. The massive usage of this programming model has led to the creation of multiple tools associated with it (which has come to be known as the Hadoop ecosystem) and even specialized companies like Cloudera engaged in training programmers to use it. NoSQL Data Modeling Techniques « Highly Scalable Blog. NoSQL databases are often compared by various non-functional criteria, such as scalability, performance, and consistency.
This aspect of NoSQL is well-studied both in practice and theory because specific non-functional properties are often the main justification for NoSQL usage and fundamental results on distributed systems like the CAP theorem apply well to NoSQL systems. At the same time, NoSQL data modeling is not so well studied and lacks the systematic theory found in relational databases. In this article I provide a short comparison of NoSQL system families from the data modeling point of view and digest several common modeling techniques. I would like to thank Daniel Kirkdorffer who reviewed the article and cleaned up the grammar. A vendor-independent comparison of NoSQL databases: Cassandra, HBase, MongoDB, Riak. Network World - "The more alternatives, the more difficult the choice.
" -- Abbe' D'Allanival In 2010, when the world became enchanted by the capabilities of cloud systems and new databases designed to serve them, a group of researchers from Yahoo decided to look into NoSQL. They developed the YCSB framework to assess the performance of new tools and find the best cases for their use. The results were published in the paper, "Benchmarking Cloud Serving Systems with YCSB. " The Yahoo guys did a great job, but like any paper, it could not include everything: ● The research did not provide all the information we needed for our own analysis. ● Though Cassandra, HBase, Yahoo's PNUTS, and a simple sharded MySQL implementation were analyzed, some of the databases we often work with were not covered. ● Yahoo used high-performance hardware, while it would be more useful for most companies to see how these databases perform on average hardware.