background preloader

Hbase

Facebook Twitter

Hbase datacenter replication. HBase should consider supporting a federated deployment where someone might have terascale (or beyond) clusters in more than one geography and would want the system to handle replication between the clusters/regions.

hbase datacenter replication

It would be sweet if HBase had something on the roadmap to sync between replicas out of the box. Consider if rows, columns, or even cells could be scoped: local, or global. Then, consider a background task on each cluster that replicates new globally scoped edits to peer clusters. The HBase/Bigtable data model has convenient features (timestamps, multiversioning) such that simple exchange of globally scoped cells would be conflict free and would "just work".

Implementation effort here would be in producing an efficient mechanism for collecting up edits from all the HRS and transmitting the edits over the network to peers where they would then be split out to the HRS there. This proposal does not consider transactional tables. Hbase coprocessors. From Google's Jeff Dean, in a keynote to LADIS 2009 ( slides 66 - 67): BigTable Coprocessors (New Since OSDI'06) Arbitrary code that runs run next to each tablet in table As tablets split and move, coprocessor code automatically splits/moves too High-level call interface for clients Unlike RPC, calls addressed to rows or ranges of rows coprocessor client library resolves to actual locations Calls across multiple rows automatically split into multiple parallelized RPCs Very flexible model for building distributed services Automatic scaling, load balancing, request routing for apps Example Coprocessor Uses Scalable metadata management for Colossus (next gen GFS-like file system) Distributed language model serving for machine translation system Distributed query processing for full-text indexing support Regular expression search support for code repository For HBase, adding a coprocessor framework will allow for pluggable incremental addition of functionality.

hbase coprocessors

Lineland: HBase vs. BigTable Comparison. HBase is an open-source implementation of the Google BigTable architecture. That part is fairly easy to understand and grasp. What I personally feel is a bit more difficult is to understand how much HBase covers and where there are differences (still) compared to the BigTable specification. This post is an attempt to compare the two systems. Before we embark onto the dark technology side of things I would like to point out one thing upfront: HBase is very close to what the BigTable paper describes. Putting aside minor differences, as of HBase 0.20 , which is using ZooKeeper as its lock distributed coordination service, it has all the means to be nearly an exact implementation of BigTable's functionality. Scope The comparison in this post is based on the OSDI'06 paper that describes the system Google implemented in about seven person-years and which is in operation since 2005. Terminology There are a few different terms used in either system describing the same thing.

Features Yes, per row. Lineland: HBase Architecture 101 - Storage. One of the more hidden aspects of HBase is how data is actually stored. While the majority of users may never have to bother about it you may have to get up to speed when you want to learn what the various advanced configuration options you have at your disposal mean. "How can I tune HBase to my needs? ", and other similar questions are certainly interesting once you get over the (at times steep) learning curve of setting up a basic system. Another reason wanting to know more is if for whatever reason disaster strikes and you have to recover a HBase installation.

In my own efforts getting to know the respective classes that handle the various files I started to sketch a picture in my head illustrating the storage architecture of HBase. Please note that this is not a UML or call graph but a merged picture of classes and the files they handle and by no means complete though focuses on the topic of this post. So what does my sketch of the HBase innards really say? 05. 06. 07. 09. 12. 14.