background preloader

Big_DATA_Tools

Facebook Twitter

A MapReduce Algorithm for Matrix Multiplication. Algorithm Overview Notation and Definitions Suppose: Matrix A has dimension IxK with elements a(i,k) for 0 <= i < I and 0 <= k < K Matrix B has dimension KxJ with elements b(k,j) for 0 <= k < K and 0 <= j < J Then: Matrix C = A*B has dimension IxJ with elements c(i,j) defined as: c(i,j) = sum over 0 <= k < K of a(i,k)*b(k,j) We split A and B into blocks (sub-matrices) small enough so that a pair of blocks can be multiplied in memory on a single node in the cluster. Let: IB = Number of rows per A block and C block. We use the following notation for the blocks. 0 <= ib < NIB 0 <= kb < NKB 0 <= jb < NJB Define: A[ib,kb] = The block of A consisting of rows IB*ib through min(IB*(ib+1),I)-1 columns KB*kb through min(KB*(kb+1),K)-1 B[kb,jb] = The block of B consisting of rows KB*kb through min(KB*(kb+1),K)-1 columns JB*jb through min(JB*(jb+1),J)-1 C[ib,jb] = The block of C consisting of rows IB*ib through min(IB*(ib+1),I)-1 columns JB*jb through min(JB*(jb+1),J)-1 C[ib,kb,jb] = A[ib,kb] * B[kb,jb] Note that:

Statistics Course Home Page. Glen Cowan, Royal Holloway, University of London, phone: (01784) 44 3452, e-mail: g.cowan@rhul.ac.uk Time & Place: The lectures take place at UCL, Mondays 3:00 to 6:00, starting on 30 September, UCL Physics/Union Building D103. this is on the first floor of Union (see the map here, ref. D1). Minor change to course structure: For the first four weeks, we will use the time from 3 to 4:30 for statistics and from 4:30 to 6 for computing. This should allow us to finish the C++ part of the course within 4 weeks. The computing part of the course is optional for the PhD students (check with your supervisor) but mandatory for the MSci/MSc students. From week five the lectures on statistical data analysis will continue now from 3 to 5. Aims: This series of lectures is intended for PhD students in Particle Physics and it also forms the University of London MSci course PH4515.

Syllabus: A general outline of the course topics. Problem sheets: Rob Miller's C++ Course (Imperial) G. T. Lecture Notes (2012): Hadoop. Hadoop. Hadoop. Mapreduce. Haloop - Project Hosting on Google Code. Why do we develop the HaLoop project? The growing demand for large-scale data mining and data analysis applications has led both industry and academia to design new types of highly scalable data-intensive computing platforms. MapReduce and Dryad are two popular platforms in which the dataflow takes the form of a directed acyclic graph of operators. However, these new platforms do not have built-in support for iterative programs, which arise naturally in many applications including data mining, web ranking, graph processing, model fitting, and so on. What is HaLoop? Simply speaking, HaLoop = Ha, Loop:-) HaLoop is a modified version of the Hadoop MapReduce framework, designed to serve these applications. HaLoop not only extends MapReduce with programming support for iterative applications, but also dramatically improves their efficiency by making the task scheduler loop-aware and by adding various caching mechanisms.

Get started: HaLoop publications Contact Yingyi Bu (buyingyi@gmail.com) Designs, Lessons and Advice from Building Large Distributed Syst. CouchDB: The CouchDB Project. Exploring CouchDB. What is CouchDB? CouchDB is an open source document-oriented database-management system, accessible using a RESTful JavaScript Object Notation (JSON) API. The term "Couch" is an acronym for "Cluster Of Unreliable Commodity Hardware," reflecting the goal of CouchDB being extremely scalable, offering high availability and reliability, even while running on hardware that is typically prone to failure.

CouchDB was originally written in C++, but in April 2008, the project moved to the Erlang OTP platform for its emphasis on fault tolerance. CouchDB can be installed on most POSIX systems, including Linux® and Mac OS X. Although Windows® isn't currently officially supported, work is under way on an unofficial binary installer for the Windows platform.

CouchDB is a top-level Apache Software Foundation open source project, released under V2.0 of the Apache license. Back to top Differences between a document-oriented and a relational database How CouchDB works Listing 1. The RESTful JSON API Summary.