NoSQL Benchmarking NoSQL is the talk of the town. And we have already covered what it is for in one of our previous blogs. Today I would like to share the NoSQL benchmark test results we have recently conducted. It will help you to understand if the soon to develop system is compatible to NoSQL, and which NoSQL product to select. In this article we will reveal the characteristics of Cassandra, HBase and MongoDB identified through multiple workloads. Why NoSQL? The interest in NoSQL continues to rise because the amount of data to process continues to increase. Why are they using NoSQL instead of RDBMS? Twitter is still using MySQL. RDBMS is known to experience burden when processing tera or peta unit large sized data. There is no single correct answer in processing bulk data. Out of the RDBMSs, Oracle is an exception since Oracle’s performance and functions, such as mass data processing or data synchronization, are far more superior to other RDBMS. Benchmarking Tests using YCSB The test workload is as follows.
www.BenStopford.com » Blog Archive » Shared Nothing v.s. Shared Disk Architectures: An Independent View The Shared Nothing Architecture is a relatively old pattern that has had a resurgence of late in data storage technologies, particularly in the NoSQL, Data Warehousing and Big Data spaces. As architectures go it’s there are fairly dramatic performance tradeoffs across the two. This article contrasts Shared Nothing with Shared Disk Architectures, which is largely equivalent to the tradeoffs between sharding and replication. Shared Disk and Shared Nothing? Shared Nothing is a data architecture for distributed data storage in a clustered environment. By comparison Shared Disk is exactly what it says; disk accessible from all cluster nodes. In the Shared Disk any node can access any piece of data and any single piece of data has no dedicated owner. Understanding the Trade-offs for Writing When persisting data in a Shared Disk architecture writes can be performed against any node. To explain this a little further consider the case described by the below diagram. So Which Should You Use?
What the heck are you actually using NoSQL for? It's a truism that we should choose the right tool for the job. Everyone says that. And who can disagree? The problem is this is not helpful advice without being able to answer more specific questions like: What jobs are the tools good at? Will they work on jobs like mine? In the NoSQL space this kind of real-world data is still a bit vague. Let's change that. Here's a list of uses cases I came up with after some trolling of the interwebs. General Use Cases These are the general kinds of reasons people throw around for using NoSQL. Bigness. More Specific Use Cases Managing large streams of non-transactional data: Apache logs, application logs, MySQL logs, clickstreams, etc.Syncing online and offline data. Redis Use Cases Redis is unique in the repertoire as it is a data structure server, with many fascinating use cases that people are excited to share. Calculating whose friends are online using sets. VoltDB Use Cases Analytics Use Cases How many request do we serve each day? Poor Use Cases
Document Databases Compared: CouchDB, MongoDB, RavenDB Brian Ritchie has two posts (☞ here and ☞ here) covering three document databases: CouchDB, MongoDB, and RavenDB concluding with the matrix below: But before using this as a reference material there are a couple of corrections needed: They have some special characteristics that make them kick some serious SQL.Objects can be stored as documents: The relational database impedance mismatch is gone. Judging by the growing number of document database mapping tools, I’m not sure impedance mismatch is really gone (related to 1st point above)Using embedded format is not always the best solution for mapping relationships and other more complex data structures. Related to the matrix comparison: Versioning is not supported by either MongoDB and CouchDB. Original title and link for this post: Document Databases Compared: CouchDB, MongoDB, RavenDB (published on the NoSQL blog: myNoSQL) by Alex Popescu & Ana-Maria Bacalu Most read Latest
BIG Data Analytics Pipeline "Big Data Analytics" has recently been one of the hottest buzzwords. It is a combination of "Big Data" and "Deep Analysis". The former is a phenomenon of Web2.0 where a lot of transaction and user activity data has been collected which can be mined for extracting useful information. Big Data Camp People working in this camp typically come from Hadoop, PIG/Hive background. From my personal experience, most of the people working in big data come from a computer science and distributed parallel processing system background but not from the statistical or mathematical discipline. Deep Analysis Camp On the other hand, people working in this camp usually come from statistical and mathematical background which the first thing being taught is how to use sampling to understand a large population's characteristic. Typical Data Processing Pipeline Learning from my previous projects, I observe most data processing pipeline fall into the following pattern. Big Data + Deep Analysis
Traditional database design vs. key-value table Consider the following two scenarios: Scenario no. 1 – traditional database design method The DBA will use the following SQL statements in order to create two additional columns: In our example, we will update the data that already exists in the table with the following update sql statements: This will make our table look like this: We might want to be able to make a new descriptive change without altering the database schema structure. Scenario no. 2 – use of key-value table The following scenario describes a generic approach to the challenge. The DBA can create an additional table which will hold any additional information that describes the employee better. Updating existing data in our example can be done using the following insert sql statements: This would yield the following table: This information is stored in a more generic structure, allowing the programmer to add employee attributes on his own as well as create a mechanism that adds employee attributes at run-time.
Guidelines for Modeling and Optimizing NoSQL Databases - LaunchAny eBay Architect Jay Patel recently posted an article about data modeling using the Cassandra data store. In his article, he breaks down how they modeled their data using Cassandra, how they approached the use of Columns and Column Families, and query optimizations. The post is very detailed and a great read. What I enjoyed most from the article was more of the high-level approach that Jay and his team took. “It’s important to understand and start with entities and relationships…” Jay reminds us that we must first understand the problem domain, model the entities involved, and the relationships between the data. “…then continue modeling around query patterns by de-normalizing and duplicating.” You cannot optimize your data model until you understand how you will be accessing it. “Remember that there are many ways to model. Always evaluate your data model based on the intended use cases.
NoSQL Relational Database Management System: Home Page A Relational Database Management System NoSQL is a fast, portable, relational database management system without arbitrary limits, (other than memory and processor speed) that runs under, and interacts with, the UNIX1 Operating System. It uses the "Operator-Stream Paradigm" described in "Unix Review", March, 1991, page 24, entitled "A 4GL Language". What is NoSQL NoSQL, which I personally like to pronounce as noseequel2, is a derivative of the RDB database system. Other major contributors to the original RDB system, besides Walter Hobbs, are: Chuck BushDon EmersonJudy LenderRoy GatesRae Starr People who helped with turning RDB into NoSQL: Vincenzo (Vicky) BelloliDavid FreyGiuseppe PaternòMaurizio (Masar) SartoriPaul LussierSeth LaForgeMicah StetsonThomas MillerMichael SomosAgustín Ferrin The NoSQL logo was kindly provided by Kyle Hart. As its name implies, NoSQL is not an SQL database but rather a shell-level tool, as explained in Philosophy of NoSQL. What NoSQL is not How to get NoSQL Support