background preloader


Facebook Twitter

MongoDB. MMS » Login. Yet another MongoDB Map Reduce tutorial. Background As the title says, this is yet-another-tutorial on Map Reduce using MongoDB. But two things that are different here: A problem solving approach is used, so we’ll take a problem, solve it in SQL first and then discuss Map Reduce. Lots of diagrams, so you’ll hopefully better understand how Map Reduce works. The Problem So without further ado, let us get started. The task is to find the 2 closest cities in each country, except in United States. Assumptions For sake of simplicity, we’ll represent earth as a 2D plane. SQL Solution If the distance between each pair of cities in a country were known then we could simply apply a GROUP BY statement where we divide the data by Country and find those two cities where the distance is minimum.

Now that we have distance between each pair of cities, we can now group this data by country and then proceed to select those 2 cities that have the least value for “Dist” field but still greater than zero. It is important to note the steps we followed. Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase comparison. (Yes it's a long title, since people kept asking me to write about this and that too :) I do when it has a point.)

Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase comparison

While SQL databases are insanely useful tools, their monopoly in the last decades is coming to an end. And it's just time: I can't even count the things that were forced into relational databases, but never really fitted them. (That being said, relational databases will always be the best for the stuff that has relations.) But, the differences between NoSQL databases are much bigger than ever was between one SQL database and another.

This means that it is a bigger responsibility on software architects to choose the appropriate one for a project right at the beginning. In this light, here is a comparison of Open Source NOSQL databases Cassandra, Mongodb, CouchDB, Redis, Riak, RethinkDB, Couchbase (ex-Membase), Hypertable, ElasticSearch, Accumulo, VoltDB, Kyoto Tycoon, Scalaris, OrientDB, Aerospike, Neo4j and HBase: The most popular ones Redis (V3.2) Cassandra (2.0) MongoDB (3.2) Welcome to Apache™ Hadoop™! Cygwin. Comprendre Hadoop en moins de 5 minutes « Java EE performance. Dans ce tutorial, nous allons découvrir Hadoop au travers de son système de fichiers distribués et son mécanisme de Map/Reduce.

Comprendre les grands concepts de Hadoop Comprendre le HDFS et le mécanisme de Map/Reduce Hadoop est un projet Open Source écrit en java, distribué par la fondation Apache. Ce framework est adapté dans le stockage et le traitement par lots de très grandes quantités de données (à partir du pétaoctet). Il a été mis en avant par des grands noms du web comme Yahoo! Son système de fichiers HDFS permet de distribuer le stockage des données et de faire des analyses très performantes sur ces données grâce au modèle MapReduce permettant de distribuer une opération sur plusieurs nœuds dans le but de paralléliser leur exécution.

Le HDFS est le système de fichiers utilisé par Hadoop. Pour plus d’informations sur l’architecture et la configuration du HDFS vous pouvez consultez la très bonne documentation de Hadoop: Il suffit d’invoquer la commande: . Hadoop fs -put logs/ Hadoop Tutorial. Home | Cloud Types | Related Technologies What is Hadoop? Miha Ahronovitz, Ahrono & Associates Kuldip Pabla, Ahrono & Associates Hadoop is a fault-tolerant distributed system for data storage which is highly scalable. The scalability is the result of a Self-Healing High Bandwith Clustered Storage , known by the acronym of HDFS (Hadoop Distributed File System) and a specific fault-tolerant Distributed Processing, known as MapReduce. (Hadoop Distributed File System) and a specific fault-tolerant Distributed Processing, known as MapReduce.

Why Hadoop as part of the IT? It processes and analyzes variety of new and older data to extract meaningful business operations wisdom. What types of data we handle today? Human-generated data that fits well into relational tables or arrays. Example of Hadoop usage Netflix (NASDAQ: NFLX) is a service offering online flat rate DVD and Blu-ray disc rental-by-mail and video streaming in the United States. They parse 0.6TB of data running on Amazon S3 50 nodes.