background preloader

Column Oriented

Facebook Twitter

The Apache Cassandra Project. Cassandra Secondary Index Patterns. We all know that any real application needs to do query based on attributes other than the primary key or row key in case of Cassandra.

Cassandra Secondary Index Patterns

Cassandra version .7 onwards provides native secondary index support. But there are several limitations. Native Secondary Index Cassandra’s native index is like a hashed index, which means you can only do equality query and not range query. The link I just mentioned shows how you can do range query on one attributes once you have an index on another attributes, as in the following query written SQL select * from product where category = xxx and price < yyy This kind of query will work in Cassandra, provided an index is defined for category. The other limitation is interesting. For attributes with high cardinality, the documentation recommends using separate column families and to implement your own secondary index. Secondary Index Patterns Index storage structure is different depending whether the whole index is stored in one row or not. Planning a Cassandra cluster deployment. Cassandra Data Modeling Best Practices, Part 1 — eBay Tech Blog.

Developing LDTV: Cassandra data model for transactions. In the past, I've built quite an extensive transaction system that has a lot of features, among those features are prepaid cards, subscriptions, simple movie rentals and package rentals.

Developing LDTV: Cassandra data model for transactions

Credit card details, fancy service subscriptions(like AllCharge) and preapproves(PayPal) It has all or nothing transaction policy, which means if the last step fails, rollback everything you did so far, which means that if you buy something with a prepaid card, but the card doesn't hold enough credit, the card will be charged all it can and the rest will be charged though your primary payment method. Which is good, but what happens if PayPal decides it can't process your transaction, whatever you did so far has to be rollback-ed, so you won't be charged on the preapprove as well. This is a really cool feature of ACID compliant databases(I use InnoDB on mysql).

But one of the weak spot of such databases, is that they don't have a flexible column model. HBase - Apache HBase™ Home. Understanding Hadoop Clusters and the Network. This article is Part 1 in series that will take a closer look at the architecture and methods of a Hadoop cluster, and how it relates to the network and server infrastructure.

Understanding Hadoop Clusters and the Network

The content presented here is largely based on academic work and conversations I’ve had with customers running real production clusters. If you run production Hadoop clusters in your data center, I’m hoping you’ll provide your valuable insight in the comments below. Subsequent articles to this will cover the server and network architecture options in closer detail. Before we do that though, lets start by learning some of the basics about how a Hadoop cluster works. OK, let’s get started! The three major categories of machine roles in a Hadoop deployment are Client machines, Masters nodes, and Slave nodes. Client machines have Hadoop installed with all the cluster settings, but are neither a Master or a Slave. In real production clusters there is no server virtualization, no hypervisor layer. Cheers, Brad. Column-oriented DBMS. A column-oriented DBMS is a database management system (DBMS) that stores data tables as sections of columns of data rather than as rows of data.

Column-oriented DBMS

In comparison, most relational DBMSs store data in rows. This column-oriented DBMS has advantages for data warehouses, customer relationship management (CRM) systems, and library card catalogs, and other ad hoc inquiry systems[1] where aggregates are computed over large numbers of similar data items. It is possible to achieve some of the benefits of column-oriented and row-oriented organization with any DBMSs. Denoting one as column-oriented refers to both the ease of expression of a column-oriented structure and the focus on optimizations for column-oriented workloads.[2][1] This approach is in contrast to row-oriented or row store databases and with correlation databases, which use a value-based storage structure. Description[edit] Background[edit] The most expensive operations involving hard drives are seeks.