background preloader

Welcome To Apache Incubator Giraph

Welcome To Apache Incubator Giraph
Apache Giraph is an iterative graph processing system built for high scalability. For example, it is currently used at Facebook to analyze the social graph formed by users and their connections. Giraph originated as the open-source counterpart to Pregel, the graph processing architecture developed at Google and described in a 2010 paper. Both systems are inspired by the Bulk Synchronous Parallel model of distributed computation introduced by Leslie Valiant. Giraph adds several features beyond the basic Pregel model, including master computation, sharded aggregators, edge-oriented input, out-of-core computation, and more. With a steady development cycle and a growing community of users worldwide, Giraph is a natural choice for unleashing the potential of structured datasets at a massive scale.

http://giraph.apache.org/

Making Hadoop 1000x Faster for Graph Problems Dr. Daniel Abadi, author of the DBMS Musings blog and Cofounder of Hadapt, which offers a product improving Hadoop performance by 50x on relational data, is now taking his talents to graph data in Hadoop's tremendous inefficiency on graph data management (and how to avoid it), which shares the secrets of getting Hadoop to perform 1000x better on graph data. Analysing graph data is at the heart of important data mining problems.Hadoop is the tool of choice for many of these problems.Hadoop style MapReduce works best on KeyValue processing, not graph processing, and can be well over a factor of 1000 less efficient than it needs to be.Hadoop inefficiency has consequences in real world. Voila! That's a 10x * 10x * 10x = 1000x performance improvement on graph problems using techniques that make a lot of sense.

Sqrrl Enterprise - Linked Data Analysis for Hadoop Our flagship product is Sqrrl Enterprise, a unified solution for integrating data to enable secure, real-time search, discovery, and analytics, powered by Apache Accumulo. Sqrrl Enterprise enables organizations to ingest, secure, connect, and analyze massive amounts of structured, semi-structured, and unstructured data: Ingest: Streaming or bulk data ingest from any source.Secure: Encryption and labeling of data with fine-grained access controls.Connect: Automatically organize data and extract information about the entities and relationships you care about.Analyze: Web-based dashboarding and visual, contextual navigation of the data and relationships in the system. Clients use Sqrrl Enterprise for a variety of real-time Big Data applications, including cybersecurity analytics, healthcare analytics, and intelligence analysis. Sqrrl licenses Sqrrl Enterprise via annual subscriptions models.

mikeaddison93/hoya How to Build an SQL Storage Adapter for RDF Data with Ruby - The Datagraph Blog RDF.rb is approaching two thousand downloads on RubyGems, and while it has good documentation it could still use some more tutorials. I recently needed to get RDF.rb working with a PostgreSQL storage backend in order to work with RDF data in a Rails 3.0 application hosted on Heroku. I thought I'd keep track of what I did so that I could discuss the notable parts. In this tutorial we'll be implementing an RDF.rb storage adapter called RDF::DataObjects::Repository, which is a simplified version of what I eventually ended up with.

AWS Lambda The code you run on AWS Lambda is called a “Lambda function.” After you create your Lambda function it is always ready to run as soon as it is triggered, similar to a formula in a spreadsheet. Each function includes your code as well as some associated configuration information, including the function name and resource requirements.

mikeaddison93/hadoop-20 Building a Graph-Based Movie Recommender Engine A recommender engine helps a user find novel and interesting items within a pool of resources. There are numerous types of recommendation algorithms and a graph can serve as a general-purpose substrate for evaluating such algorithms. This post will demonstrate how to build a graph-based movie recommender engine using the publicly available MovieLens dataset, the graph database Neo4j, and the graph traversal language Gremlin. Feel free to follow along in the Gremlin console as the post will go step-by-step from data acquisition, to parsing, and ultimately, to traversing. The MovieRatings Dataset The GroupLens research group has made available a corpus of movie ratings. Pivotal Cloud Foundry What is the Buildpack Architecture in Pivotal Cloud Foundry? Pivotal CF uses a flexible approach called buildpacks to dynamically assemble and configure a complete runtime environment for executing a particular type of applications. Since buildpacks are extensible to most modern runtimes and frameworks, applications written in nearly any language can be deployed to Pivotal Cloud Foundry. Developers benefit from an “it just works” experience as the platform applies the appropriate buildpack to detect, download and configure the language, framework, container and libraries for the application. Pivotal Cloud Foundry provided buildpacks for Java, Ruby, Node, PHP, Python and golang are part of a broad buildpack provider ecosystem that ensures constant updates and maintenance for virtually any language.

JetS3t Stig Database What is BigQuery? - Google BigQuery Querying massive datasets can be time consuming and expensive without the right hardware and infrastructure. Google BigQuery solves this problem by enabling super-fast SQL queries against append-only tables using the processing power of Google's infrastructure. Simply move your data into BigQuery and let us handle the hard work. You can control access to both the project and your data based on your business needs, such as giving others the ability to view or query your data. You can access BigQuery by using a web UI or a command-line tool, or by making calls to the BigQuery REST API using a variety of client libraries such as Java, .NET or Python.

Cloud Services - HDInsight (Hadoop) Scale elastically on demand HDInsight is a Hadoop distribution powered by the cloud. This means HDInsight was architected to handle any amount of data, scaling from terabytes to petabytes on demand. Big Data Business Insights Looking for data-driven business insights? With the integrated Teradata Aster Discovery Platform, organizations attain unmatched competitive advantage by making it faster and easier for a wider group of users to generate powerful, high impact business insights from big data. Products and Solutions Teradata Workload Specific Platforms Teradata Hardware Overview - The Teradata Platform Family, all running the Teradata Database, includes the Active Enterprise Data Warehouse, the Data Warehouse Appliance, the Data Mart Appliance, and the Integrated Big Data Platform.

Handling five billion sessions a day – in real time Since we first released Answers seven months ago, we’ve been thrilled by tremendous adoption from the mobile community. We now see about five billion sessions per day, and growing. Hundreds of millions of devices send millions of events every second to the Answers endpoint. During the time that it took you to read to here, the Answers back-end will have received and processed about 10,000,000 analytics events. The challenge for us is to use this information to provide app developers with reliable, real-time and actionable insights into their mobile apps. At a high level, we guide our architectural decisions on the principles of decoupled components, asynchronous communication and graceful service degradation in response to catastrophic failures.

Related: