background preloader

MapReduce

Facebook Twitter

Relational Database Experts Jump The MapReduce Shark. In this article relational database experts David DeWitt and Michael Stonebraker compare MapReduce to traditional relational database systems (RDBMSs) and find MapReduce wanting.

Relational Database Experts Jump The MapReduce Shark

They make some strong points in favor or relational databases, but the comparison is not appropriate. When I finished reading the article I was thinking that the authors did not understand MapReduce or the idea of data in the cloud, or why programmers might be excited about non-RDBMS ways to manage data. The article makes five points: 1. MapReduce is a step backwards in database access They’re right about that, but MapReduce is not a database system.

MapReduce has the same relationship to RDBMSs as my motorcycle has to a snowplow — it’s a step backwards in snowplow technology if you look at it that way. 2. 3. 4. RDBMSs are great tools for managing large sets of structured data, enforcing integrity, optimizing queries, and separating the data structure and schema from the application. The Anatomy of a Search Engine. Sergey Brin and Lawrence Page {sergey, page}@cs.stanford.edu Computer Science Department, Stanford University, Stanford, CA 94305 Abstract In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext.

The Anatomy of a Search Engine

Can Your Programming Language Do This? By Joel Spolsky Tuesday, August 01, 2006 One day, you're browsing through your code, and you notice two big blocks that look almost exactly the same.

Can Your Programming Language Do This?

In fact, they're exactly the same, except that one block refers to "Spaghetti" and one block refers to "Chocolate Moose. " // A trivial example: alert("I'd like some Spaghetti! "); alert("I'd like some Chocolate Moose! ") These examples happen to be in JavaScript, but even if you don't know JavaScript, you should be able to follow along. The repeated code looks wrong, of course, so you create a function: function SwedishChef( food ) { alert("I'd like some " + food + "! ") OK, it's a trivial example, but you can imagine a more substantial example. Now you notice two other blocks of code which look almost the same, except that one of them keeps calling this function called BoomBoom and the other one keeps calling this function called PutInPot.

Now you need a way to pass an argument to the function which itself is a function. Look! Lemme repeat that. Map-Reduce With Ruby Using Apache Hadoop. Guest re-post from Phil Whelan, a large-scale web-services consultant based in Vancouver, BC.

Map-Reduce With Ruby Using Apache Hadoop

Here I demonstrate, with repeatable steps, how to fire-up a Hadoop cluster on Amazon EC2, load data onto the HDFS (Hadoop Distributed File-System), write map-reduce scripts in Ruby and use them to run a map-reduce job on your Hadoop cluster. You will not need to ssh into the cluster, as all tasks are run from your local machine. Below I am using my MacBook Pro as my local machine, but the steps I have provided should be reproducible on other platforms running bash and Java. Fire-Up Your Hadoop Cluster I chose Cloudera’s Distribution for Apache Hadoop which is 100% Apache licensed, but has some additional benefits. I am going to use Cloudera’s Whirr script, which will allow me to fire up a production ready Hadoop cluster on Amazon EC2 directly from my laptop. MapReduce Tutorial. You can view this page in Belorussian here!

MapReduce Tutorial

This document gives a quick example of how to use the MapReduce implementation described in [1], by means of a simple example. The example is available as a tarball here, Updated 31 July 2010. Please note that the MapReduceScheduler.c file differs slightly from the version released by the original authors. The version in the tarball has been modified to compile cleanly with GCC 4.0.2. The files are also available as syntax-highlighted HTML here (the MapReduce implementation is not shown, and fatals.* are elided).

Notice: Updated 31 July 2010: Since this tutorial has gained some popularity for non-Wisconsin users, I've modified the files in the tarball to build by default on x86/Linux instead of SPARC/Solaris. Caveat: The MapReduce implementation seems to break if when a very large number of keys are emitted (eg. 2 billion+). Unpacking the Tarball MapReduce Programming Model Quoting [1] directly: Summing an Array with MapReduce The Map Function. Introduction to Parallel Programming and MapReduce - Google Code University - Google Code.