background preloader

NoSQL

Facebook Twitter

Hackery :: Exploring Riak. Been playing with Riak recently, which is one of the modern dynamo-derived nosql databases (the other main ones being Cassandra and Voldemort). We're evaluating it for use as a really large brackup datastore, the primary attraction being the near linear scalability available by adding (relatively cheap) new nodes to the cluster, and decent availability options in the face of node failures.

I've built riak packages for RHEL/CentOS 5, available at my repository, and added support for a riak 'target' to the latest version (1.10) of brackup (packages also available at my repo). The first thing to figure out is the maximum number of nodes you expect your riak cluster to get to. This you use to size the ring_creation_size setting, which is the number of partitions the hash space is divided into. It must be a power of 2 (64, 128, 256, etc.), and the reason it's important is that it cannot be easily changed after the cluster has been created. That's it.

<a href=" Hackery :: Exploring Riak. San Francisco Riak Meetup (San Francisco, CA. Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase comparison :: KKovacs. While SQL databases are insanely useful tools, their monopoly in the last decades is coming to an end. And it's just time: I can't even count the things that were forced into relational databases, but never really fitted them. (That being said, relational databases will always be the best for the stuff that has relations.) But, the differences between NoSQL databases are much bigger than ever was between one SQL database and another.

This means that it is a bigger responsibility on software architects to choose the appropriate one for a project right at the beginning. In this light, here is a comparison of Open Source NOSQL databases: The most popular ones # Redis # Best used: For rapidly changing data with a foreseeable database size (should fit mostly in memory). For example: To store real-time stock prices. Cassandra # Best used: When you need to store data so huge that it doesn't fit on server, but still want a friendly familiar interface to it.

MongoDB # ElasticSearch # CouchDB # Accumulo # NoSQL Databases - NoSQL Databases. Blog of Data » Blog Archive » Benchmarking Riak for the Mozilla Test Pilot Project. Introduction: Using A Riak Cluster for the Mozilla Test Pilot Project As part of integrating Test Pilot into the Firefox 4.0 beta, we needed a production-worthy back-end for storing the experiment results and performing analysis on them. As discussed in the previous blog post, Riak and Cassandra and Hbase, oh my! , we decided on Riak as that back-end. Some of the preliminary work and a lot of the initial implementation involved conducting benchmarking studies that would verify the fitness of the solution and give us a solid understanding of when and how we would need to scale.

The Test Pilot program involves the storage and processing of usability experiments from users who have opted in to the program. So our median payload size is 25 KB and the max item was 2 MB. We were interested to see how Riak would handle the load, especially given our plan to use pre-commit hooks to ensure the data conformed to the expected format and to limit exceedingly large files (in excess of 5MB). Software. ErlangUserConference2009-RustyKlophaus.

Latest News: Getting Started With Riak & Python. Riak is one of a handful of non-relational datastores that has experienced some exposure lately, and with good reason. It's written in Erlang, is Dynamo-inspired and, while technically is a key-value store, functions very well in our experience as a document store (meaning you can store complex data as the value). Reading through the overview is highly recommended. It's also extremely predictable in production, handles node failures well and (here's the really interesting bit) demonstrates linear performance as you add nodes (see these benchmarks). This is impressive because many distributed systems, especially in the NoSQL space, don't behave like this. Like many other NoSQL options, it uses MapReduce for many types of queries (which you can write in either Javascript for ad hoc queries or Erlang for repetitious, speedy queries).

But also of interest is that it provides an additional way to get at values in Riak Search, a Solr-like search component. An excellent question. Consistent hashing. Today I get back into my post series about the Google Technology Stack, with a more detailed look at distributed dictionaries, AKA distributed key-value stores, AKA distributed hash tables.

What we’d like to do is store a dictionary of key-value pairs across a cluster of computers, preferably in a way that makes it easy to manipulate the dictionary without having to think about the details of the cluster. The reason we’re interested in distributed dictionaries is because they’re used as input and output to the MapReduce framework for distributed computing. Of course, that’s not the only reason distributed dictionaries are interesting – they’re useful for many other purposes (e.g., distributed caching). But for the purposes of this post, we’ll imagine our distributed dictionaries are being used as the input and output from a MapReduce job. I’ll describe two ways of implementing distributed dictionaries. The first is a naive method that’s simple and works pretty well. Computers.

. , where . . Riak SmartMachine Benchmark: The Technical Details. Webmachine, ErlyDTL and Riak – Part 1 | OJ's rants. Pivotal Labs: Talks. Latest News: Getting Started With Riak & Python. ErlangUserConference2009-RustyKlophaus. Why Riak Search Matters... The awesome dudes at Basho released Riak 0.13 and with it their first version of Riak Search yesterday. This is all kinds of exciting, and I'll tell you why. Riak Search is (way down below) based on Lucene, both the library and the query interface. It mimicks the Solr web API for querying and indexing. Just like you'd expect something coming out of Basho, you can add and remove nodes at any time, scaling up and down as you go. The key/value model is quite restrictive when it comes to fetching data by, well anything else than a key. Remember though, it's just a first release, which will be improved over time.

I urge you to play with it. Update: From reading all this you may get the impression that Riak Search builds heavily on a Lucene foundation. Travisswicegood.com. Using Innostore with Riak « Gradual Epiphany. Introducing Riak. Riak-js. Blog of Data » Blog Archive » Benchmarking Riak for the Mozilla Test Pilot Project. NYC NoSQL Fall '09: Bryan Fink from Basho Technologies demonstrates the riak web-shaped data storage engine. NoSQL Databases - NoSQL Databases. Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase comparison :: KKovacs. San Francisco Riak Meetup (San Francisco, CA. Minute With Riak. Riak's Bitcask - A Log-Structured Hash Table for Fast Key/Value Data. How would you implement a key-value storage system if you were starting from scratch? The approach Basho settled on with Bitcask, their new backend for Riak, is an interesting combination of using RAM to store a hash map of file pointers to values and a log-structured file system for efficient writes.

In this excellent Changelog interview, some folks from Basho describe Bitcask in more detail. The essential Bitcask: Keys are stored in memory for fast lookups. Eric Brewer (CAP theorem) came up with idea with Bitcask by considering if you have the capacity to keep all keys in memory, which is quite likely on modern systems, you can have a relatively easy to design and implement storage system. When a value is updated it is first appended to the on-disk commit log. Some potential issues: Riak Core: Building Distributed Applications Without Shared State | Commercial Users of Functional Programming. Saturday, October 02, 2010 - 03:30 PM - 04:00 PM Abstract: Storing big data reliably is hard. Searching that data is just as hard. Basho Technologies, the company behind Riak KV and Riak Search, focuses on solving these two problems. Both Riak KV (a key-value datastore and map/reduce platform) and Riak Search (a Solr-compatible full-text search and indexing engine) are built around a library called Riak Core that manages the mechanics of running a distributed application in a cluster without requiring a central coordinator or shared state.

Using Riak Core, these applications can scale to hundreds of servers, handle enterprise-sized amounts of data, and remain operational in the face of server failure.* This talk will describe the implementation, responsibilities, and boundaries of Riak Core, including how Riak Core: Special attention will be paid to how Riak Core adopts common functional programming patterns and leverages OTP/Erlang libraries and behaviours. Toying around with Riak for Linked Data. So I stumbled upon Rob Vesse’s tweet the other day, where he said he was about to use MongoDB for storing RDF. A week earlier I watched a nice video about links and link walking in Riak, “a Dynamo-inspired key/value store that scales predictably and easily” (see also the Wiki doc).

Now, I was wondering what it takes to store an RDF graph in Riak using Link headers. Let me say that it was very easy to install Riak and to get started with the HTTP interface. The main issue then was how to map the RDF graph into Riak buckets, objects and keys. Here is what I came up so far – I use a RDF resource-level approach with a special object key that I called:id, which is the RDF resource URI or the bNode.

Further, in order to maintain the graph provenance, I store the original RDF document URI in the metadata of the Riak bucket. Enough words. Take the following RDF graph (in Turtle): Thoughts, anyone? Like this: Like Loading... Web of Data researcher and practitioner. Who uses Riak. Introduction à Riak. Hackery :: Exploring Riak. Riak at Appush, San Francisco NoSQL Meetup | build 47. A few weeks ago the San Francisco NoSQL Meetup Group held its first meeting at CBS Interactive in San Francisco with the topic of Riak at Appush presented by Dan Reverri from Appush.

I had not previously heard of Riak so before attending the talk I very briefly looked up what it is. I discovered it was a key-value created by Basho and since I had been doing some reading about Redis and Voldemort that was enough information to get me interested. After the talk, thanks to Dan's great presentation, I discovered Riak has a lot more capabilities than just a simple key-value store.

Riak is more than a simple single server key-value store, it is distributed, scalable, and supports replication. Unlike Redis, you don't need to implement your own sharding strategy, scalability is built in and automatic. When you add a new physical node, Riak will automatically redistribute data. Dan Reverri's presentation was quite comprehensive and informative and you can view his Riak slides on Prezi. Implementing Indexes in Riak | Jeremiah Peschka. Every database has secondary indexes, right?

Not quite. It turns out that some databases don’t support them. Secondary indexes are important because they make it possible to perform more, quick, queries on a given chunk of data. What if we want to add secondary indexes to a database, how would we go about doing it? Looking at queries, there are a few basic ways that we actually query data: Equality predicatesInequality predicatesMulti-predicate queries We’re going to be using Riak as our example database Equality Predicates The easiest type of query is an equality predicate. Function(value, keyData, arg){ var data = Riak.mapValuesJson(value)[0]; if (data.Birthdate == '1976-07-04') { return [ data ]; } } With an RDBMS, we’d just create an index on the table to support our queries.

To create an index in Riak, we’d create another bucket, users_by_birthdate. Multi-Predicate Equality Searches I’m going to skip inequality predicates for a second and talk about multi-predicate equality searches. Riak: From Design to Deploy: Velocity 2010, Web Performance & Operations Conference - O'Reilly Conferences, June 22 - 24, 2010, Santa Clara, CA.

Riak - An Open Source Scalable Data Store. Riak | Jeremiah Peschka. Every database has secondary indexes, right? Not quite. It turns out that some databases don’t support them. Secondary indexes are important because they make it possible to perform more, quick, queries on a given chunk of data. What if we want to add secondary indexes to a database, how would we go about doing it? Looking at queries, there are a few basic ways that we actually query data: Equality predicatesInequality predicatesMulti-predicate queries We’re going to be using Riak as our example database Equality Predicates The easiest type of query is an equality predicate. Function(value, keyData, arg){ var data = Riak.mapValuesJson(value)[0]; if (data.Birthdate == '1976-07-04') { return [ data ]; } } With an RDBMS, we’d just create an index on the table to support our queries. To create an index in Riak, we’d create another bucket, users_by_birthdate.

Multi-Predicate Equality Searches I’m going to skip inequality predicates for a second and talk about multi-predicate equality searches.