background preloader

Big Data

Facebook Twitter

Gephi, an open source graph visualization and manipulation software. OGF DRMAA Working Group. Welcome to Apache™ Hadoop™! TORQUE Resource Manager. TORQUE Resource Manager provides control over batch jobs and distributed computing resources.

TORQUE Resource Manager

It is an advanced open-source product based on the original PBS project* and incorporates the best of both community and professional development. It incorporates significant advances in the areas of scalability, reliability, and functionality and is currently in use at tens of thousands of leading government, academic, and commercial sites throughout the world. TORQUE may be freely used, modified, and distributed under the constraints of the included license. TORQUE can integrate with Moab Workload Manager to improve overall utilization, scheduling and administration on a cluster. Customers who purchase Moab family products also receive free support for TORQUE. Download TORQUE Now! Fault Tolerance Additional failure conditions checked/handledNode health check script support Scheduling Interface Scalability Usability TORQUE Status TORQUE is freely available for download from adaptive.wpengine.com.

Apache Hadoop. Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware.

Apache Hadoop

Hadoop is an Apache top-level project being built and used by a global community of contributors and users.[2] It is licensed under the Apache License 2.0. The Apache Hadoop framework is composed of the following modules: Hadoop Common – contains libraries and utilities needed by other Hadoop modulesHadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster.Hadoop YARN – a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications.Hadoop MapReduce – a programming model for large scale data processing.

Welcome to Hive! Apache Hive. Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis.[1] While initially developed by Facebook, Apache Hive is now used and developed by other companies such as Netflix.[2][3] Amazon maintains a software fork of Apache Hive that is included in Amazon Elastic MapReduce on Amazon Web Services.[4] Features[edit] Apache Hive supports analysis of large datasets stored in Hadoop's HDFS and compatible file systems such as Amazon S3 filesystem.

Apache Hive

It provides an SQL-like language called HiveQL while maintaining full support for map/reduce. To accelerate queries, it provides indexes, including bitmap indexes.[5] By default, Hive stores metadata in an embedded Apache Derby database, and other client/server databases like MySQL can optionally be used.[6] Currently, there are four file formats supported in Hive, which are TEXTFILE, SEQUENCEFILE, ORC and RCFILE.[7][8] Other features of Hive include: HiveQL[edit]

HBase - HBase Home. HBase. HBase is not a direct replacement for a classic SQL database, although recently its performance has improved, and it is now serving several data-driven websites,[2][3] including Facebook's Messaging Platform.[4][5] In the parlance of Eric Brewer’s CAP theorem, HBase is a CP type system.

HBase

History[edit] Facebook elected to implement its new messaging platform using HBase in November 2010.[4] See also[edit] References[edit] Bibliography[edit] External links[edit] Hbase/Cascading. Cascading is an alternative API to Hadoop MapReduce.

Hbase/Cascading

Under the covers it uses MapReduce during execution, but during development, users don't have to think in MapReduce to create solutions for execution on Hadoop. Cascading now has support for reading and writing data to and from a HBase cluster. Detailed information and access to the source code can be found on the Cascading Modules page. Cascading 1.0.1 is required. Here is a simple example showing how to "sink" data into an HBase cluster.

Toggle line numbers Note the "hBaseTap" above can be used as both a sink and a source in a Flow. Does anyone find Cascading for Hadoop Map Reduce useful. Application Platform for Enterprise Big Data. Big Data Platform. MongoDB.