background preloader

NoSQL Design

Facebook Twitter

BIG Data Analytics Pipeline. "Big Data Analytics" has recently been one of the hottest buzzwords.

BIG Data Analytics Pipeline

It is a combination of "Big Data" and "Deep Analysis". The former is a phenomenon of Web2.0 where a lot of transaction and user activity data has been collected which can be mined for extracting useful information. The later is about using advanced mathematical/statistical technique to build models from the data. Trees in MongoDB. To model hierarchical or nested data relationships, you can use references to implement tree-like structures.

Trees in MongoDB

The following Tree pattern examples model book categories that have hierarchical relationships. Model Tree Structures with Child References (link) The Child References pattern stores each tree node in a document; in addition to the tree node, document stores in an array the id(s) of the node’s children. Consider the following hierarchy of categories: Mike Hillyer's Personal Webspace - Managing Hierarchical Data in MySQL. Introduction.

Mike Hillyer's Personal Webspace - Managing Hierarchical Data in MySQL

Adjacency list vs. nested sets: PostgreSQL. This series of articles is inspired by numerous questions asked on the site and on Stack Overflow.

Adjacency list vs. nested sets: PostgreSQL

What is better to store hierarchical data: nested sets model or adjacency list (parent-child) model? First, let's explain what all this means. Adjacency list Hierarchical relations (not to be confused with hierarchical data model) are 0-1:0-N transitive relations between entities of same domain. Storing Hierarchical Data in a Database Article. Now, let’s have a look at another method for storing trees.

Storing Hierarchical Data in a Database Article

Recursion can be slow, so we would rather not use a recursive function. We’d also like to minimize the number of database queries. Preferably, we’d have just one query for each activity. We’ll start by laying out our tree in a horizontal way. Start at the root node (‘Food’), and write a 1 to its left. We’ll call these numbers left and right (e.g. the left value of ‘Food’ is 1, the right value is 18). Before we continue, let’s see how these values look in our table: Note that the words ‘left’ and ‘right’ have a special meaning in SQL. Retrieve the Tree If you want to display the tree using a table with left and right values, you’ll first have to identify the nodes that you want to retrieve.

What Every Developer Should Know About Database Scalability. Cassandra Data Modeling Best Practices, Part 1 — eBay Tech Blog. Guidelines for Modeling and Optimizing NoSQL Databases - LaunchAny. eBay Architect Jay Patel recently posted an article about data modeling using the Cassandra data store.

Guidelines for Modeling and Optimizing NoSQL Databases - LaunchAny

In his article, he breaks down how they modeled their data using Cassandra, how they approached the use of Columns and Column Families, and query optimizations. The post is very detailed and a great read. What I enjoyed most from the article was more of the high-level approach that Jay and his team took. Here are my favorite takeaways from their approach to data modeling and query optimization, that I believe can be applied to any NoSQL database, including Cassandra, MongoDB, Redis, and others.

“It’s important to understand and start with entities and relationships…” Jay reminds us that we must first understand the problem domain, model the entities involved, and the relationships between the data. JOINs via denormalization for NoSQL coders, Part 2: Materialized views - Web development blog. Thomas Wanschik on September 27, 2010 In part 1 we discussed a workaround for JOINs on non-relational databases using denormalization in cases for which the denormalized properties of the to-one side don't change.

JOINs via denormalization for NoSQL coders, Part 2: Materialized views - Web development blog

In this post we'll show one way to handle JOINs for mutable properties of the to-one side i.e. properties of users. Let's summarize our current situation: NoSQL Data Modeling Techniques « Highly Scalable Blog. NoSQL databases are often compared by various non-functional criteria, such as scalability, performance, and consistency.

NoSQL Data Modeling Techniques « Highly Scalable Blog

This aspect of NoSQL is well-studied both in practice and theory because specific non-functional properties are often the main justification for NoSQL usage and fundamental results on distributed systems like the CAP theorem apply well to NoSQL systems. At the same time, NoSQL data modeling is not so well studied and lacks the systematic theory found in relational databases. In this article I provide a short comparison of NoSQL system families from the data modeling point of view and digest several common modeling techniques. I would like to thank Daniel Kirkdorffer who reviewed the article and cleaned up the grammar. To explore data modeling techniques, we have to start with a more or less systematic view of NoSQL data models that preferably reveals trends and interconnections. What the heck are you actually using NoSQL for?

It's a truism that we should choose the right tool for the job.

What the heck are you actually using NoSQL for?

Everyone says that. And who can disagree? The problem is this is not helpful advice without being able to answer more specific questions like: What jobs are the tools good at? Will they work on jobs like mine? Is it worth the risk to try something new when all my people know something else and we have a deadline to meet? In the NoSQL space this kind of real-world data is still a bit vague. Let's change that. Here's a list of uses cases I came up with after some trolling of the interwebs. General Use Cases These are the general kinds of reasons people throw around for using NoSQL. Bigness. More Specific Use Cases.