background preloader


Facebook Twitter

Graph Processing With Apache Pig. Seven Databases: Neo4j and misunderstanding indexes « Sinking In. The Neo4j chapter of Seven Databases in Seven Weeks has a short discussion of indexing (starting on p241 of the P1.0 version of the PDF).

Seven Databases: Neo4j and misunderstanding indexes « Sinking In

I found it mislead me into thinking there were two types of index, when really there are just two ways to query an index. The book creates an index named authors by simply adding a key-value-node triple to it. It says that the resulting index is key-value or hash style, and shows that the node can be retrieved by supplying the key and value: Installation · dbpedia-spotlight/dbpedia-spotlight Wiki. Throw away the keys: Easy, Minimal Perfect Hashing. In part 1 of this series, I described how to find the closest match in a dictionary of words using a Trie.

Throw away the keys: Easy, Minimal Perfect Hashing

Such searches are useful because users often mistype queries. But tries can take a lot of memory -- so much that they may not even fit in the 2 to 4 GB limit imposed by 32-bit operating systems. In part 2, I described how to build a MA-FSA (also known as a DAWG). The MA-FSA greatly reduces the number of nodes needed to store the same information as a trie. They are quick to build, and you can safely substitute an MA-FSA for a trie in the fuzzy search algorithm. There is a problem. If we need extra information about the words, we can use an additional data structure along with the MA-FSA. Notice that the table needs to store the keys (the words that we want to look up) as well as the data associated with them.

Minimal perfect hashing Perfect hashing is a technique for building a hash table with no collisions. We use two levels of hash functions. . #! Experimental Results gperf Dr. Dr. Rogueleaderr. SIREn: Semantic Information Retrieval Engine. Problems uploading 1bln+ triples. Jexp/batch-import. Sail Implementation · tinkerpop/blueprints Wiki. OpenRDF is the creator of the Sail interface (Storage and Inference Layer).

Sail Implementation · tinkerpop/blueprints Wiki

Any triple or quad-store developer can implement the Sail interfaces in order to allow third-party developer to work with different stores without having to change their code. This is very handy as different RDF-store implementations are optimized for different types of use cases. In analogy, Sail is like the JDBC of the RDF database world. The Storage And Inference Layer (Sail) API is a low level System API (SPI) for RDF stores and inferencers. Its purpose is to abstract from the storage and inference details, allowing various types of storage and inference to be used. Many triple and quad-store developers have implemented the Sail interface.

Visualizing RDF Schema inferencing through Neo4J, Tinkerpop, Sail and Gephi - Datablend. Gephi, an open source graph visualization and manipulation software. Neo4J, RDF and Kevin Bacon. Today, I managed to wangle my way into Off the Rails, a train hack day.

Neo4J, RDF and Kevin Bacon

I was helping friends with data mangling: OpenStreetMap, Dbpedia, RDF and Neo4J. It’s funny actually. Way back when, if I said to people that there is some data that fits quite well into graph models, they’d look at me like some kind of dangerous looney. Graphs? Why? Actually, no. If you are trying to model a system where there are trains that travel on tracks between stations, that maps quite nicely to graphs, nodes and edges. Oh, yeah, there is. Rogueleaderr. [Warning: This is another super-technical post.


If you don’t know what the Semantic Web and RDF are, this will be incomprehensible.] In my last post, I talked about my attempt, as a novice programmer currently capable of only rudimentary Python and not much else, to use Neo4j as an RDF triple store so that I could work with the DBpedia dataset on my laptop. Tinkerpop is an open-source set of tools that lets you magically convert Neo4j into a fully functional triplestore. My conclusion from that attempt was that using only Python to set up and control Neo4j for RDF is basically impossible. I’m still determined to accomplish that goal, so my new plan is to just bite the bullet and teach myself “just enough Java” (JeJ. As of six months ago, I knew basically nothing about programming. Claudio martella. DISCLAIMER: this is a bit of a hack, but it should get you started.

claudio martella

I managed to get the core dataset of DBpedia into Neo4J, but this procedure should actually be working for any Blueprints-ready vendor, like OrientDB. Ok, a little background first: we want to store DBpedia inside of a GraphDB, instead of the typical TripleStore, and run SPARQL queries over it. DBpedia is a project aiming to extract structured content from Wikipedia, information such as the one you can find in the infoboxes, the links, the categorization infos, geo-coordinates etc.

This information is extracted and exported as triples to form a graph, a network of properties and relationships between Wikipedia resources. So we're going to store millions of triples like "Barack Obama -- president of --> United States of America", or "Rome -- capital of --> Italy" etc. and once we have these triples in the store, we can run queries over this graph with a language that is not so different from SQL.

Enjoy. Sail Ouplementation · tinkerpop/blueprints Wiki. RDF data in Neo4J: the Tinkerpop story - Datablend. As mentioned in my previous blog post , I recently got asked to implement a storage and querying platform for biological RDF (Resource Description Framework) data.

RDF data in Neo4J: the Tinkerpop story - Datablend

Traditional RDF stores are not really an option as my solution should also provide the ability to calculate shortest paths between random subjects . Calculating shortest path is however one of the strong selling points of Graph Databases and more specifically Neo4J . Unfortunately, the neo-rdf-sail component, which suits my requirements perfectly, is no longer under active development. Tinkerpop’s Sail implementation however, fills the void with an even better alternative! 1. Tinkerpop is an open source project that provides an entire stack of technologies within the Graph Database space. 2. Last time, I talked about exposing a Neo4J Graph Database (containing RDF triples) through the interface, which is part of the project.