Summarize Opinions with a Graph – Part 1. How does the saying go? Opinions are like bellybuttons, everybody’s got one? So let’s say you have an opinion that NOSQL is not for you. Maybe you read my blog and think this Graph Database stuff is great for recommendation engines and path finding and maybe some other stuff, but you got really hard problems and it can’t help you. I am going to try to show you that a graph database can help you solve your really hard problems if you can frame your problem in terms of a graph. Did I say “you”? We present a novel graph-based summarization framework (Opinosis) that generates concise abstractive summaries of highly redundant opinions.
What does that mean? How is this useful? Let’s dive into what this means by an example that everyone is familiar with, e-commerce. You can see the 1 to 5 star ratings and you already know how to build a recommendation algorithm out of this. Today we are just going to look at Step 1. My phone calls drop frequently with the iPhone. Like this: Like Loading... An overview of Neo4j Internals. Usage of Neo4j in a professional web based scientific software. This section will quickly cover internal concepts and features from QS2 and how we have considered them with Neo4j. Each sub-section can be covered by another blog post, don't hesitate to ask for more!
One database per customer With embedded Neo4j, this feature is really easy to implement. Simply create a new EmbeddedGraphDatabase with the right path :). This way, the currently used databases are the only ones to keep in memory (important in an SaaS environment). In addition to security, another advantage is load balancing. Upgraders and migrations Even if Neo4j is schemaless, as we have to send DTOs to the client, each node is typed and follows a core-schema (defined by DTO and linked to Neo4j using getProperty and setProperty calls).
An upgrader is a class defined with a version number to upgrade, and implements an "upgrade" method. "Parameters" nodes One important feature of QS2 is user defined parameters. Ordered trees "Primary key" and Ids Now, Neo4j. NodeId Chained local IDs. Get the full neo4j power by using the Core Java API for traversing your Graph data base instead of Cypher Query Language. As I said yesterday I have been busy over the last months producing content so here you go. For related work we are most likely to use neo4j as core data base. This makes sense since we are basically building some kind of a social network. Most queries that we need to answer while offering the service or during data mining carry a friend of a friend structure. For some of the queries we are doing counting or aggregations so I was wondering what is the most efficient way of querying against a neo4j data base.
So I did a Benchmark with quite surprising results. Just a quick remark, we used a data base consisting of papers and authors extracted from arxiv.org one of the biggest pre print sites available on the web. Paper1 <--[ref]--> Paper2 | | |[author] |[author] v v Author1 Author2 For the benchmark we where trying to find coauthors which is basically a friend of a friend query following the author relationship (or breadth first search (depth 2)) Java Core API Traverser Framework. PEGASUS: Peta-Scale Graph Mining System. Pegasus An award-winning, open-source, graph-mining system with massive scalability. Analyze petabytes of graph data with ease. English, all platforms We won the Open Source Software World Challenge, Silver Award. Metadata and Semantic Technologies Series. October 12, 2011 MongoGraph is an effort to bring the Semantic Web to MongoDB developers. We implemented a MongoDB interface to AllegroGraph to give Javascript programmers both Joins and the Semantic Web.
JSON objects are automatically translated into triples and both the MongoDB query language and SPARQL work against your objects. Join us for this webcast to learn more about working on the level of objects instead of individual triples, where an object would be defined as all the triples with the same subject. View a recording of the event here - 30 min. Download the presentation slides here View the demonstration portion only - 10 min.