background preloader

High Speed Database

Facebook Twitter


NOSQL. Building Scalable Databases: Denormalization, the NoSQL Movement and Digg. Database normalization is a technique for designing relational database schemas that ensures that the data is optimal for ad-hoc querying and that modifications such as deletion or insertion of data does not lead to data inconsistency.

Building Scalable Databases: Denormalization, the NoSQL Movement and Digg

Database denormalization is the process of optimizing your database for reads by creating redundant data. A consequence of denormalization is that insertions or deletions could cause data inconsistency if not uniformly applied to all redundant copies of the data within the database. Why Denormalize Your Database? Today, lots of Web applications have "social" features. A consequence of this is that whenever I look at content or a user in that service, there is always additional content from other users that also needs to be pulled in to page. This is optimizes your reads at the cost of incurring more writes to the system. The Problem In both models, we’re computing the intersection of two sets: Users who dugg an item. The No-SQL Movement vs. A Hitchhiker's Guide to NOSQL v1.0. CTO of 10gen, MongoDB creators: We are sort of similar to MySQL or PostgreSQL in terms of how you could use us « myNoSQL.

Some quotes and comments from ☞ (a quite long) interview with Eliot Horowitz, CTO of 10gen, creators of MongoDB: I think the first question you have to ask about any database these days is, “What’s the data model?”

CTO of 10gen, MongoDB creators: We are sort of similar to MySQL or PostgreSQL in terms of how you could use us « myNoSQL

The only thing I’d add is: “… and how does that fit my problem?”. That whole class of problems exists because there’s a very clunky mapping from objects to relational databases. With document databases, that mapping becomes much simpler. I only partially agree with this. I also think […] that the object databases before were actually more closely related to current graph databases than to document databases. I assume that the connection with graph databases is based on the following arguments: the connectivity between objects can be very rich and while all that can be persisted it is not accomplished in a transparent way.

If you look at our road map for this year, there’s no one big feature. I totally agree. MongoDB.


Welcome to HBase! Amazon's Dynamo. In two weeks we’ll present a paper on the Dynamo technology at SOSP, the prestigious biannual Operating Systems conference.

Amazon's Dynamo

Dynamo is internal technology developed at Amazon to address the need for an incrementally scalable, highly-available key-value storage system. The technology is designed to give its users the ability to trade-off cost, consistency, durability and performance, while maintaining high-availability. Let me emphasize the internal technology part before it gets misunderstood: Dynamo is not directly exposed externally as a web service; however, Dynamo and similar Amazon technologies are used to power parts of our Amazon Web Services, such as S3. We submitted the technology for publication in SOSP because many of the techniques used in Dynamo originate in the operating systems and distributed systems research of the past years; DHTs, consistent hashing, versioning, vector clocks, quorum, anti-entropy based recovery, etc. The official reference for the paper is: Database Sharding. CodeFutures offers an effective sharding solution with our product, dbShards.

Database Sharding

Our customers have used dbShards to achieve unprecedented performance, in the scope of hundreds of millions of reads and millions of writes every day. Database Sharding The Rise of Database Sharding The concept of Database Sharding has been gaining popularity over the past several years, due to the enormous growth in transaction volume and size of business application databases. This is particularly true for many successful online service providers, Software as a Service (SaaS) companies, and social networking Web sites. Database Sharding can be simply defined as a “shared-nothing” partitioning scheme for large databases across a number of servers, enabling new levels of database performance and scalability achievable. The term “sharding” was coined by Google engineers, and popularized through their publication of the Big Table architecture. What Drives the Need for Database Sharding?

Figure 1. CPUMemoryDisk.