background preloader

Data

Facebook Twitter

5 Graph Databases to Consider. Of the major categories of NoSQL databases - document-oriented databases, key-value stores and graph databases - we've given the least attention to graph databases on this blog. That's a shame, because as many have pointed out it may become the most significant category. Graph databases apply graph theory to the storage of information about the relationships between entries. The relationships between people in social networks is the most obvious example.

The relationships between items and attributes in recommendation engines is another. Yes, it has been noted by many that it's ironic that relational databases aren't good for storing relationship data. Adam Wiggins from Heroku has a lucid explanation of why that is here. Google has its own graph computing system called Pregel (you can find the paper on the subject here), but there are several commercial and open source graph databases available. Neo4j Neo Technologies cites several customers, though none of them are household names.

Hadoop - NoSQL - NoSQL and Cloud Databases. What is Hadoop? Hadoop isn't a simple database; it's a bunch of different technologies built on top of the Hadoop common utilities, MapReduce, and HDFS (Hadoop Distributed File System). Each of these products serves a simple purpose - HDFS handles storage, MapReduce is a parallel job distribution system, HBase is a distributed database with support for structured tables. You can find out more on the Apache Hadoop page. How Do You Install It? Installing Hadoop is not quite as easy as installing Cassandra. Cloudera Flavor If you're running linux (which is the easiest way to do this), just follow the instructions to use Cloudera's Hadoop repositories.If you don't have a Linux distribution handy, you can download a VM from Cloudera (yeah, it's that easy).If you really want to run Cloudera's Hadoop on Windows, you will need to install Cygwin and create a Linux-like environment.

Apache Flavor Which Login and Security Model(s) Does Hadoop Support? Good question! N.B. Or any RDBMS for that matter. Why Use a Graph-Oriented Database? | YarcData. Suppose you worked for a business analysis software company, and your CEO wanted you to look into the possibility of developing a product that would help investment banks detect insider trading. Further suppose that the CEO wanted you to brief her on your proposed technical approach to insider trading detection, and you’re standing in front of a whiteboard with a marker in your hand (you know that she likes hand-sketched diagrams), and you’ve decided to use a fictionalized version of this story you read in Bloomberg as an example.

What would you draw on the whiteboard? I’m thinking it might look something like this: Let’s look at the last three arrows in the diagram. There are other situations when it would be natural to draw a sort of diagram like this. Here we see that your daughter has several friends, most of whom are also friends with each other, but who also have friends she doesn’t know. Let’s consider a different example. What about the investment club membership roster? How Twitter Uses NoSQL. InfoQ has released a video of Twitter's Kevin Weil speaking at Strange Loop earlier this year on how the company uses NoSQL. Weil is quick to point out that Twitter is heavily dependent on MySQL. However, Twitter does employ NoSQL solutions for many purposes for which MySQL isn't ideal.

According to Weil, Twitter users generate 12 terrabytes of data a day - about four petabytes per year. And that amount is multiplying every year. Read on for our notes on Weil's talk. Scribe Syslog stopped scaling for Twitter after a while, so instead it uses Scribe, a log collection framework created and open-sourced by Facebook. Twitter uses Scribe to write logs to Hadoop. Hadoop Twitter needs to store more data per day than it can reliably write to a single hard drive, so it needs to store data on clusters. Weil says MySQL isn't efficient at doing analytics at the scale Twitter needs. Pig This Pig script finds the top five pages of your site visited by people aged 18 to 25. Hbase.

Neo4j - a Graph Database that Kicks Buttox. Update: Social networks in the database: using a graph database. A nice post on representing, traversing, and performing other common social network operations using a graph database. If you are Digg or LinkedIn you can build your own speedy graph database to represent your complex social network relationships. For those of more modest means Neo4j, a graph database, is a good alternative. A graph is a collection nodes (things) and edges (relationships) that connect pairs of nodes. Slap properties (key-value pairs) on nodes and relationships and you have a surprisingly powerful way to represent most anything you can think of. In a graph database "relationships are first-class citizens. A graph looks something like: For more lovely examples take a look at the Graph Image Gallery.

Here's a good summary by Emil Eifrem, founder of the Neo4j, making the case for why graph databases rule: Most applications today handle data that is deeply associative, i.e. structured as graphs (networks). Neo4j -- or why graph dbs kick ass. Home · tinkerpop/gremlin Wiki. Www.rene-pickhardt.de/wp-content/uploads/2011/09/social_news_streams_and_time_indices_on_social-2.pdf. Www.rene-pickhardt.de/wp-content/uploads/2012/11/SocialCom2012Graphity.pdf.

Www.rene-pickhardt.de/wp-content/uploads/2011/11/2012SocialComEfficientGraphModelsForRetrievingTopKNewsFeedsFromEgoNetworks.pdf.