background preloader

HBase - Apache HBase™ Home

HBase - Apache HBase™ Home
Welcome to Apache HBase™ Apache HBase™ is the Hadoop database, a distributed, scalable, big data store. When Would I Use Apache HBase? Use Apache HBase when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al.

http://hbase.apache.org/

Related:  Graph DBsHadoop Tools

Berkeley DB Java Edition Oracle Berkeley DB Java Edition is an open source, embeddable, transactional storage engine written entirely in Java. It takes full advantage of the Java environment to simplify development and deployment. The architecture of Oracle Berkeley DB Java Edition supports very high performance and concurrency for both read-intensive and write-intensive workloads.

Running_Hadoop_On_OS_X_10.5_64-bit_(Single-Node_Cluster) Step 1: Creating a designated hadoop user on your system This isn't -entirely- necessary, but it's a good idea for security reasons. To add a user, go to: Amazon's Dynamo In two weeks we’ll present a paper on the Dynamo technology at SOSP, the prestigious biannual Operating Systems conference. Dynamo is internal technology developed at Amazon to address the need for an incrementally scalable, highly-available key-value storage system. The technology is designed to give its users the ability to trade-off cost, consistency, durability and performance, while maintaining high-availability. Let me emphasize the internal technology part before it gets misunderstood: Dynamo is not directly exposed externally as a web service; however, Dynamo and similar Amazon technologies are used to power parts of our Amazon Web Services, such as S3.

Leveling the Field July 1, 2011 For most Riak users, Bitcask is the obvious right storage engine to use. It provides low latency, solid predictability, is robust in the face of crashes, and is friendly from a filesystem backup point of view. However, it has one notable limitation: total RAM use depends linearly (though via a small constant) on the total number of objects stored. Object2RecordJavaBinding - orient - Object to Record mapping - NoSQL document database light, portable and fast. Supports ACID Tx, Indexes, asynch queries, SQL layer, clustering, etc The ObjectDatabase implementation makes things easier for the Java developer since the binding between Objects to Records is transparent. How it works? OrientDB uses Java reflection and Javassist Proxy to bound POJOs to Records directly. Those proxied instances take care about the synchronization between the POJO and the underlying record. Every time you invoke a setter method against the POJO, the value is early bound into the record. Every time you call a getter method the value is retrieved from the record if the POJO's field value is null.

Titan: Distributed Graph Database Titan is a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster. Titan is a transactional database that can support thousands of concurrent users executing complex graph traversals in real time. In addition, Titan provides the following features: Download Titan or clone from GitHub. Read the Titan documentation and join the mailing list. <dependency><groupId>com.thinkaurelius.titan</groupId><artifactId>titan-core</artifactId><version>1.0.0</version></dependency><! Build, Install and Configure Eclipse Plugin for Apache Hadoop 2.2.0 - SrcCodes C:\hadoop2x-eclipse-plugin\src\contrib\eclipse-plugin>ant jar -Dversion=2.2.0 -Declipse.home=C:/IDE/sts-3.5.0 -Dhadoop.home=c:/hadoop Buildfile: C:\hadoop2x-eclipse-plugin\src\contrib\eclipse-plugin\build.xml check-contrib: init:

Database Sharding CodeFutures offers an effective sharding solution with our product, dbShards. Our customers have used dbShards to achieve unprecedented performance, in the scope of hundreds of millions of reads and millions of writes every day. Database Sharding The Rise of Database Sharding The concept of Database Sharding has been gaining popularity over the past several years, due to the enormous growth in transaction volume and size of business application databases.

Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase comparison (Yes it's a long title, since people kept asking me to write about this and that too :) I do when it has a point.) While SQL databases are insanely useful tools, their monopoly in the last decades is coming to an end. And it's just time: I can't even count the things that were forced into relational databases, but never really fitted them. (That being said, relational databases will always be the best for the stuff that has relations.) But, the differences between NoSQL databases are much bigger than ever was between one SQL database and another.

Plain Old Java Object "We wondered why people were so against using regular objects in their systems and concluded that it was because simple objects lacked a fancy name. So we gave them one, and it's caught on very nicely."[1] The term "POJO" initially denoted a Java object which does not follow any of the major Java object models, conventions, or frameworks; nowadays "POJO" may be used as an acronym for "Plain Old JavaScript Object" as well, in which case the term denotes a JavaScript object of similar pedigree.[2] The term continues the pattern of older terms for technologies that do not use fancy new features, such as POTS (Plain Old Telephone Service) in telephony and Pod (Plain Old Documentation) in Perl.

Representing time dependent graphs in Neo4j · SocioPatterns/neo4j-dynagraph Wiki Background Large-scale data collection efforts using wearable sensors to mine for proximity of individuals (for example, the SocioPatterns project) produce time-varying social graphs, where nodes are individuals, edges represent proximity/contact relations of individuals, and the proximity graph changes over time. Both nodes and edges can have rich attributes. Data formats for exchanging the time-dependent graphs are available, see for instance the GEXF format. IntelliJ Project for Building Hadoop – The Definitive Guide Examples I have been studying Hadoop – The Definitive Guide by Tom White and started building the sample applications with the Makefile I discussed in my last blog. Although the Makefile approach works, I decided to try using the IntelliJ Community Edition IDE to build the examples in any given chapter all at once. This time around I’ll walk you through a procedure to create an IntelliJ project for building Hadoop applications. Install IntelliJ If you don’t have it already, you can get the latest version of IntelliJ Community Edition here. Select the package for your operating system of choice, either Mac OS or Linux, then install IntelliJ by placing the package contents in your directory of choice.

CTO of 10gen, MongoDB creators: We are sort of similar to MySQL or PostgreSQL in terms of how you could use us « myNoSQL Some quotes and comments from ☞ (a quite long) interview with Eliot Horowitz, CTO of 10gen, creators of MongoDB: I think the first question you have to ask about any database these days is, “What’s the data model?” The only thing I’d add is: “… and how does that fit my problem?”.

A distributed, scalable, big data store with random, real time read/write access. by sergeykucherov Jul 15

Related:  Base de données ColonnesHadoop EcologyData Scienceapache\Veille TechnoData ManagementApache-ProjectsColumn