background preloader

Graph Store

Facebook Twitter

Building a Graph-Based Movie Recommender Engine. A recommender engine helps a user find novel and interesting items within a pool of resources. There are numerous types of recommendation algorithms and a graph can serve as a general-purpose substrate for evaluating such algorithms. This post will demonstrate how to build a graph-based movie recommender engine using the publicly available MovieLens dataset, the graph database Neo4j, and the graph traversal language Gremlin.

Feel free to follow along in the Gremlin console as the post will go step-by-step from data acquisition, to parsing, and ultimately, to traversing. The MovieRatings Dataset The GroupLens research group has made available a corpus of movie ratings. There are 3 versions of this dataset: 100 thousand, 1 million, and 10 million ratings. This post makes use of the 1 million ratings version of the dataset. Getting Started with Gremlin All of the code examples can be cut and pasted into the Gremlin console or into a Groovy/Java class within a larger application. 1.marko$ . 5. Eprints.cs.univie.ac.at/2833/1/europeana_ts_report.pdf. Parliament High-Performance Triple Store. Orient - NoSQL document database light, portable and fast. Supports ACID Tx, Indexes, asynch queries, SQL layer, clustering, etc. Hama - Svn.aksw.org/papers/2011/ISWC_AKSWBenchmark/public.pdf. How to Build an SQL Storage Adapter for RDF Data with Ruby - The Datagraph Blog. RDF.rb is approaching two thousand downloads on RubyGems, and while it has good documentation it could still use some more tutorials.

I recently needed to get RDF.rb working with a PostgreSQL storage backend in order to work with RDF data in a Rails 3.0 application hosted on Heroku. I thought I'd keep track of what I did so that I could discuss the notable parts. In this tutorial we'll be implementing an RDF.rb storage adapter called RDF::DataObjects::Repository, which is a simplified version of what I eventually ended up with. If you want the real thing, check it out on GitHub and read the docs. This tutorial will only cover the SQLite backend and won't concern itself with database indexes, performance tweaks, or any other distractions from the essential RDF.rb interfaces we'll focus on. I'll mention, briefly, that I chose DataObjects as the database abstraction layer, but I don't want to dwell on that -- this post is about RDF. Requirements Testing First So where do we start? Each. MongoGraph - MongoDB Meets the Semantic Web.

Cumulusrdf - RDF Storage in the Cloud. Discover Yourself! Stig Database. Welcome To Apache Incubator Giraph. Rdf3x - RISC-style RDF database engine. Cs-www.cs.yale.edu/homes/dna/papers/sw-graph-scale.pdf. Making Hadoop 1000x Faster for Graph Problems. Dr. Daniel Abadi, author of the DBMS Musings blog and Cofounder of Hadapt, which offers a product improving Hadoop performance by 50x on relational data, is now taking his talents to graph data in Hadoop's tremendous inefficiency on graph data management (and how to avoid it), which shares the secrets of getting Hadoop to perform 1000x better on graph data.

Analysing graph data is at the heart of important data mining problems.Hadoop is the tool of choice for many of these problems.Hadoop style MapReduce works best on KeyValue processing, not graph processing, and can be well over a factor of 1000 less efficient than it needs to be.Hadoop inefficiency has consequences in real world. Voila!

That's a 10x * 10x * 10x = 1000x performance improvement on graph problems using techniques that make a lot of sense. What may be less obvious is the whole idea of keeping the Hadoop shell and making the component parts more efficient for graph problems.