background preloader

Raspberry PI - Big data

Facebook Twitter

The history of Hadoop – Medium. The story begins on a sunny afternoon, sometime in 1997, when Doug Cutting (“the man”) started writing the first version of Lucene.

The history of Hadoop – Medium

What is Lucene, you ask. TLDR; generally speaking, it is what makes Google return results with sub second latency. Apache Lucene is a full text search library. OK, great, but what is a full text search library? FT search library is used to analyze ordinary text with the purpose of building an index. It took Cutting only three months to have something usable. By the end of the year, already having a thriving Apache Lucene community behind him, Cutting turns his focus towards indexing web pages. PageRank algorithm An important algorithm, that’s used to rank web pages by their relative importance, is called PageRank, after Larry Page, who came up with it (I’m serious, the name has nothing to do with web pages).It’s really a simple and brilliant algorithm, which basically counts how many links from other pages on the web point to a page. The origins of HDFS. Welcome to Apache™ Hadoop®!

The history of Hadoop – Medium. Gfs sosp2003. St-tutor5-R-mapreduce.pdf. Step by step to build my first R Hadoop System. R and Hadoop: Step-by-step tutorials. Building an R Hadoop System - R and Data Mining. The information provided in this page might be out-of-date.

Building an R Hadoop System - R and Data Mining

Please see a newer version at Step-by-Step Guide to Setting Up an R-Hadoop System.This page shows how to build an R Hadoop system, and presents the steps to set up my first R Hadoop system in single-node mode on Mac OS X. After reading documents and tutorials on MapReduce and Hadoop and playing with RHadoop for about 2 weeks, finally I have built my first R Hadoop system and successfully run some R examples on it.

Here I’d like to share my experience and steps to achieve that. Hopefully it will make it easier to try RHadoop for R users who are new to Hadoop. Note that I tried this on Mac only and some steps might be different for Windows. Before going through the complex steps below, let’s have a look what you can get, to give you a motivation to continue. Now let’s start. How-to-build-a-7-node-Raspberry-Pi-Hadoop-Cluster.pdf. Getting hadoop to run on the Raspberry Pi. Hadoop was implemented on Java, so getting it to run on the Pi is just as easy as doing so on x86 servers.

Getting hadoop to run on the Raspberry Pi

First of all, we need JVM for pi. You can either get OpenJDK or Oracle’s JDK 8 for ARM Early Access. I would personally recommended JDK8 as it is **just a little slightly* faster than OpenJDK, which is easier to install. 1. Install Java Installing OpenJDK is easy, just do and wait pi@raspberrypi ~ $ sudo apt-get install openjdk-7-jdk pi@raspberrypi ~ $ java -version java version "1.7.0_07" OpenJDK Runtime Environment (IcedTea7 2.3.2) (7u7-2.3.2a-1+rpi1) OpenJDK Zero VM (build 22.0-b10, mixed mode) Alternatively, you can install Oracle’s JDK 8 for ARM Early Access (some said it was optimized for Pi).

If you have both versions installed, you can use switch between them with. Hadoop on a Raspberry Pi. Looking for a fun side project this winter?

Hadoop on a Raspberry Pi

Jamie Whitehorn has an idea for you. He put Hadoop on a cluster of Raspberry Pi mini-computers. Sound ridiculous? For a student trying to learn Hadoop, it could be ridiculously cool. Raspberry Pi. A Hadoop data lab project on Raspberry Pi - Par... Carsten Mönning and Waldemar Schiller Hadoop has developed into a key enabling technology for all kinds of Big Data analytics scenarios.

A Hadoop data lab project on Raspberry Pi - Par...

Although Big Data applications have started to move beyond the classic batch-oriented Hadoop architecture towards near real-time architectures such as Spark, Storm, etc., [1] a thorough understanding of the Hadoop & MapReduce & HDFS principles and services such as Hive, HBase, etc. operating on top of the Hadoop core still remains one of the best starting points for getting into the world of Big Data. Renting a Hadoop cloud service or even getting hold of an on-premise Big Data appliance will get you Big Data processing power but no real understanding of what is going on behind the scene. To inspire your own little Hadoop data lab project, this four part blog will provide a step-by-step guide for the installation of open source Apache Hadoop from scratch on Raspberry Pi 2 Model B over the course of the next three to four weeks.

Preliminaries su hduser. Raspberry PI Hadoop Cluster - Jonas Widriksson. If you like Raspberry Pi’s and like to get into Distributed Computing and Big Data processing what could be a better than creating your own Raspberry Pi Hadoop Cluster?

Raspberry PI Hadoop Cluster - Jonas Widriksson

The tutorial does not assume that you have any previous knowledge of Hadoop. Hadoop is a framework for storage and processing of large amount of data. Or “Big Data” which is a pretty common buzzword those days. The performance of running Hadoop on a Rasperry PI is probably terrible but I hope to be able to make a small and fully functional little cluster to see how it works and perform.

For a tutorial on Hadoop 2 please see my newer post: In this tutorial we start with using one Raspberry PI at first and then adding two more after we have a working single node. Big Data University. Useful Stuff.