background preloader


Facebook Twitter

HA Data Store

Using HBase Snapshots - Amazon Elastic MapReduce. HBase uses a built-in snapshot functionality to create lightweight backups of tables.

Using HBase Snapshots - Amazon Elastic MapReduce

In EMR clusters, these backups can be exported to Amazon S3 using EMRFS. You can create a snapshot on the master node using the HBase shell. This topic shows you how to run these commands interactively with the shell or through a step using command-runner.jar with either the AWS CLI or AWS SDK for Java. Spark Streaming with HBase. This post will help you get started using Apache Spark Streaming with HBase on the MapR Sandbox.

Spark Streaming with HBase

Spark Streaming is an extension of the core Spark API that enables continuous data stream processing. Editor’s Note: Download our free E-Book Getting Started with Apache Spark: From Inception to Production here. This post is the fifth in a series; if you are new to Spark, read these first: What is Spark Streaming? First of all, what is streaming? Website monitoring , Network monitoringFraud detectionWeb clicksAdvertisingInternet of Things: sensors Spark Streaming supports data sources such as HDFS directories, TCP sockets, Kafka, Flume, Twitter, etc. How Spark Streaming Works Streaming data is continuous and needs to be batched to process.

Architecture of the example Streaming Application The Spark Streaming example code does the following: Reads streaming data. Apache Spark Comes to Apache HBase with HBase-Spark Module - Cloudera Engineering Blog. The SparkOnHBase project in Cloudera Labs was recently merged into the Apache HBase trunk.

Apache Spark Comes to Apache HBase with HBase-Spark Module - Cloudera Engineering Blog

In this post, learn the project’s history and what the future looks like for the new HBase-Spark module. SparkOnHBase was first pushed to Github on July 2014, just six months after Spark Summit 2013 and five months after Apache Spark first shipped in CDH. That conference was a big turning point for me, because for the first time I realized that the MapReduce engine had a very strong competitor. Spark was about to enter an exciting new phase in its open source life cycle, and just one year later, it’s used at massive scale at 100s if not 1000s of companies (with 200+ of them doing so on Cloudera’s platform).


Access HBase Tables with Hive - Amazon Elastic MapReduce. HBase and Hive and Amazon EMR (EMR 3.x Releases) are tightly integrated, allowing you run massively parallel processing workloads directly on data stored in HBase.

Access HBase Tables with Hive - Amazon Elastic MapReduce

To use Hive with HBase, you can usually launch them on the same cluster. You can, however, launch Hive and HBase on separate clusters. Running HBase and Hive separately on different clusters can improve performance because this allows each application to fully utilize cluster resources. Chapter 14. Apache HBase (TM) Operational Management. This chapter will cover operational tools and practices required of a running Apache HBase cluster.

Chapter 14. Apache HBase (TM) Operational Management

The subject of operations is related to the topics of but is a distinct topic in itself. Here we list HBase tools for administration, analysis, fixup, and debugging. There is a Driver class that is executed by the HBase jar can be used to invoke frequently accessed utilities. For example, HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-VERSION.jar ... will return...

An example program must be given as the first argument. ... for allowable program names. Apache HBase ™ Reference Guide. Phoenix in 15 minutes or less. What is this new Phoenix thing I’ve been hearing about?

Phoenix in 15 minutes or less

Phoenix is an open source SQL skin for HBase. You use the standard JDBC APIs instead of the regular HBase client APIs to create tables, insert data, and query your HBase data. Doesn’t putting an extra layer between my application and HBase just slow things down? Actually, no. Phoenix achieves as good or likely better performance than if you hand-coded it yourself (not to mention with a heck of a lot less code) by: compiling your SQL queries to native HBase scans determining the optimal start and stop for your scan key orchestrating the parallel execution of your scans bringing the computation to the data by pushing the predicates in your where clause to a server-side filter executing aggregate queries through server-side hooks (called co-processors) In addition to these items, we’ve got some interesting enhancements in the works to further optimize performance: Ok, so it’s fast.

Blah, blah, blah - I just want to get started! . HBase client application best practices - Hortonworks. Apache ZooKeeper - Home.