background preloader

ↂ Hadoop

Facebook Twitter

Hadoop: An Apache-managed software framework derived from MapReduce and Bigtable. Hadoop allows applications based on MapReduce to run on large clusters of commodity hardware.

Hadoop is designed to parallelize data processing across computing nodes to speed computations and hide latency. Two major components of Hadoop exist: a massively scalable distributed file system that can support petabytes of data and a massively scalable MapReduce engine that computes results in batch.

Hadoop Distributed File System (HDFS): A versatile, resilient, clustered approach to managing files in a big data environment. HDFS is not the final destination for files. Rather, it is a data “service” that offers a unique set of capabilities needed when data volumes and velocity are high.

Found in: Hurwitz, J., Nugent, A., Halper, F. & Kaufman, M. (2013) Big Data For Dummies. Hoboken, New Jersey, United States of America: For Dummies.

ISBN: 9781118504222. ▼ Software. ⊿ Point. {R} Glossary. ◢ Keyword: H. ◥ University. {q} PhD. ⏫ THEMES. ⏫ Big Data. ↂ EndNote. Apache Hadoop. Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. Hadoop is an Apache top-level project being built and used by a global community of contributors and users.[2] It is licensed under the Apache License 2.0. The Apache Hadoop framework is composed of the following modules: Hadoop Common – contains libraries and utilities needed by other Hadoop modulesHadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster.Hadoop YARN – a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications.Hadoop MapReduce – a programming model for large scale data processing.

Apache Hadoop is a registered trademark of the Apache Software Foundation. History[edit] Hadoop was created by Doug Cutting and Mike Cafarella[5] in 2005. Architecture[edit] Apache Hadoop. Distributed data processing framework Apache Hadoop () is a collection of open-source software utilities for reliable, scalable, distributed computing. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use.[3] It has since also found use on clusters of higher-end hardware.[4][5] All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.[6] The base Apache Hadoop framework is composed of the following modules: Apache Hadoop's MapReduce and HDFS components were inspired by Google papers on MapReduce and Google File System.[14] A small Hadoop cluster includes a single master and multiple worker nodes.

Hadoop requires the Java Runtime Environment (JRE) 1.6 or higher. Hadoop distributed file system [edit] Apache™ Hadoop®! What is Hadoop. About Hadoop® Apache™ Hadoop® is an open source software project that enables the distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. Rather than relying on high-end hardware, the resiliency of these clusters comes from the software’s ability to detect and handle failures at the application layer. Apache Hadoop has two main subprojects: MapReduce - The framework that understands and assigns work to the nodes in a cluster.HDFS - A file system that spans all the nodes in a Hadoop cluster for data storage.

Hadoop is supplemented by an ecosystem of Apache projects, such as Pig, Hive and Zookeeper, that extend the value of Hadoop and improves its usability. So what’s the big deal? Hadoop changes the economics and the dynamics of large scale computing. Hadoop enables a computing solution that is: Think Hadoop is right for you? ☝️ BD Dummies.