MapReduce

Facebook Twitter

MongoDB. MMS » Login. Yet another MongoDB Map Reduce tutorial. Background As the title says, this is yet-another-tutorial on Map Reduce using MongoDB. But two things that are different here: A problem solving approach is used, so we’ll take a problem, solve it in SQL first and then discuss Map Reduce. Lots of diagrams, so you’ll hopefully better understand how Map Reduce works. The Problem So without further ado, let us get started. The task is to find the 2 closest cities in each country, except in United States. Assumptions For sake of simplicity, we’ll represent earth as a 2D plane. SQL Solution If the distance between each pair of cities in a country were known then we could simply apply a GROUP BY statement where we divide the data by Country and find those two cities where the distance is minimum.

Now that we have distance between each pair of cities, we can now group this data by country and then proceed to select those 2 cities that have the least value for “Dist” field but still greater than zero. It is important to note the steps we followed. Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase comparison. (Yes it's a long title, since people kept asking me to write about this and that too :) I do when it has a point.)

Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase comparison

While SQL databases are insanely useful tools, their monopoly in the last decades is coming to an end. And it's just time: I can't even count the things that were forced into relational databases, but never really fitted them. (That being said, relational databases will always be the best for the stuff that has relations.) But, the differences between NoSQL databases are much bigger than ever was between one SQL database and another. This means that it is a bigger responsibility on software architects to choose the appropriate one for a project right at the beginning. In this light, here is a comparison of Cassandra, Mongodb, CouchDB, Redis, Riak, Couchbase (ex-Membase), Hypertable, ElasticSearch, Accumulo, VoltDB, Kyoto Tycoon, Scalaris, Neo4j and HBase: The most popular ones MongoDB (2.2) Written in: C++ Main point: Retains some friendly properties of SQL. Riak (V1.2) Welcome to Apache™ Hadoop™!

Cygwin. Comprendre Hadoop en moins de 5 minutes « Java EE performance. Hadoop Tutorial. Home | Cloud Types | Related Technologies What is Hadoop?

Hadoop Tutorial

Miha Ahronovitz, Ahrono & Associates Kuldip Pabla, Ahrono & Associates Hadoop is a fault-tolerant distributed system for data storage which is highly scalable. The scalability is the result of a Self-Healing High Bandwith Clustered Storage , known by the acronym of HDFS (Hadoop Distributed File System) and a specific fault-tolerant Distributed Processing, known as MapReduce. (Hadoop Distributed File System) and a specific fault-tolerant Distributed Processing, known as MapReduce. Why Hadoop as part of the IT? It processes and analyzes variety of new and older data to extract meaningful business operations wisdom.

What types of data we handle today? Human-generated data that fits well into relational tables or arrays. Example of Hadoop usage Netflix (NASDAQ: NFLX) is a service offering online flat rate DVD and Blu-ray disc rental-by-mail and video streaming in the United States. They parse 0.6TB of data running on Amazon S3 50 nodes.