background preloader

Giraph - Welcome To Apache Giraph!

Giraph - Welcome To Apache Giraph!
Apache Giraph is an iterative graph processing system built for high scalability. For example, it is currently used at Facebook to analyze the social graph formed by users and their connections. Giraph originated as the open-source counterpart to Pregel, the graph processing architecture developed at Google and described in a 2010 paper. Both systems are inspired by the Bulk Synchronous Parallel model of distributed computation introduced by Leslie Valiant. Giraph adds several features beyond the basic Pregel model, including master computation, sharded aggregators, edge-oriented input, out-of-core computation, and more. With a steady development cycle and a growing community of users worldwide, Giraph is a natural choice for unleashing the potential of structured datasets at a massive scale.

Related:  Java Libraries and ToolssoftwareHigh Performance Big Data Analytics InfrastructureGraph Database

EasyMock Requirements EasyMock requires Java 1.5.0 and above Cglib (2.2+) and Objenesis (2.0+) must be in the classpath to perform class mocking Using Maven EasyMock is available in the Maven central repository. Just add the following dependency to your pom.xml: 5 Graph Databases to Consider Of the major categories of NoSQL databases - document-oriented databases, key-value stores and graph databases - we've given the least attention to graph databases on this blog. That's a shame, because as many have pointed out it may become the most significant category. Graph databases apply graph theory to the storage of information about the relationships between entries. The relationships between people in social networks is the most obvious example. The relationships between items and attributes in recommendation engines is another.

Sqrrl Enterprise - Linked Data Analysis for Hadoop Our flagship product is Sqrrl Enterprise, a unified solution for integrating data to enable secure, real-time search, discovery, and analytics, powered by Apache Accumulo. Sqrrl Enterprise enables organizations to ingest, secure, connect, and analyze massive amounts of structured, semi-structured, and unstructured data: Ingest: Streaming or bulk data ingest from any source.Secure: Encryption and labeling of data with fine-grained access controls.Connect: Automatically organize data and extract information about the entities and relationships you care about.Analyze: Web-based dashboarding and visual, contextual navigation of the data and relationships in the system. Clients use Sqrrl Enterprise for a variety of real-time Big Data applications, including cybersecurity analytics, healthcare analytics, and intelligence analysis. Sqrrl licenses Sqrrl Enterprise via annual subscriptions models.

Apache Giraph (Incubating) Skip to end of metadataGo to start of metadata Web and online social graphs have been rapidly growing in size and scale during the past decade. In 2008, Google estimated that the number of web pages reached over a trillion. Research Publication: Sawzall Interpreting the Data: Parallel Analysis with Sawzall Rob Pike, Sean Dorward, Robert Griesemer, Sean Quinlan Abstract Very large data sets often have a flat but regular structure and span multiple disks and machines. Examples include telephone call records, network logs, and web document repositories. These large data sets are not amenable to study using traditional database techniques, if only because they can be too large to fit in a single relational database.

guava-libraries - Guava: Google Core Libraries for Java 1.6+ The Guava project contains several of Google's core libraries that we rely on in our Java-based projects: collections, caching, primitives support, concurrency libraries, common annotations, string processing, I/O, and so forth. The latest release is 16.0.1, released February 4, 2014. Start using Guava You can download a JAR at: Which freaking database should I use? August 02, 2012 Follow @acoliver I've been in Chicago for the last few weeks setting up our first satellite office for my company.

AWS Lambda The code you run on AWS Lambda is called a “Lambda function.” After you create your Lambda function it is always ready to run as soon as it is triggered, similar to a formula in a spreadsheet. Each function includes your code as well as some associated configuration information, including the function name and resource requirements.

Dato Core™ Open Source SFrame™, the fast, scalable engine of GraphLab Create™ is now open source. The SFrame project provides the complete implementation of the following: SFrame SArray SGraph The C++ SDK surface area (gl_sframe, gl_sarray, gl_sgraph) Support for strictly typed columns (int, float, str, datetime), weakly typed columns (schema free lists, dictionaries) as well as specialized types such as Image. Uniform support for missing data. Kafka Prior releases: 0.7.x, 0.8.0. 1. Getting Started The streams framework is a Java implementation of a simple stream processing environment. It aims at providing a clean and easy-to-use Java-based platform to process streaming data. The core module of the streams library is a thin API layer of interfaces and classes that reflect a high-level view of streaming processes.

Overview of Bulbs, a Python Framework for Graph Databases like Neo4j A Python framework for graph databases. Bulbs is an open-source Python persistence framework for graph databases and the first piece of a larger Web-development toolkit that will be released in the upcoming weeks. It’s like an ORM for graphs, but instead of SQL, you use the graph-traveral language Gremlin to query the database. Bulbs supports pluggable backends, and you can use it to connect to either Neo4j Server or Rexster. Pivotal Cloud Foundry What is the Buildpack Architecture in Pivotal Cloud Foundry? Pivotal CF uses a flexible approach called buildpacks to dynamically assemble and configure a complete runtime environment for executing a particular type of applications. Since buildpacks are extensible to most modern runtimes and frameworks, applications written in nearly any language can be deployed to Pivotal Cloud Foundry. Developers benefit from an “it just works” experience as the platform applies the appropriate buildpack to detect, download and configure the language, framework, container and libraries for the application. Pivotal Cloud Foundry provided buildpacks for Java, Ruby, Node, PHP, Python and golang are part of a broad buildpack provider ecosystem that ensures constant updates and maintenance for virtually any language.

Knuth: The Stanford GraphBase by Donald E. Knuth (New York: ACM Press, 1994), viii+576pp. Co-published by Addison-Wesley Publishing Company.

Related:  Hadoop EcologyGraph StoreApache-Kool-Tools