background preloader

NoSQL

Facebook Twitter

Exploring Riak. Been playing with Riak recently, which is one of the modern dynamo-derived nosql databases (the other main ones being Cassandra and Voldemort).

Exploring Riak

We're evaluating it for use as a really large brackup datastore, the primary attraction being the near linear scalability available by adding (relatively cheap) new nodes to the cluster, and decent availability options in the face of node failures. I've built riak packages for RHEL/CentOS 5, available at my repository, and added support for a riak 'target' to the latest version (1.10) of brackup (packages also available at my repo). The first thing to figure out is the maximum number of nodes you expect your riak cluster to get to. This you use to size the ring_creation_size setting, which is the number of partitions the hash space is divided into. It must be a power of 2 (64, 128, 256, etc.), and the reason it's important is that it cannot be easily changed after the cluster has been created.

That's it. Exploring Riak. San Francisco Riak Meetup (San Francisco, CA. Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase comparison. (Yes it's a long title, since people kept asking me to write about this and that too :) I do when it has a point.)

Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase comparison

While SQL databases are insanely useful tools, their monopoly in the last decades is coming to an end. And it's just time: I can't even count the things that were forced into relational databases, but never really fitted them. (That being said, relational databases will always be the best for the stuff that has relations.) But, the differences between NoSQL databases are much bigger than ever was between one SQL database and another. This means that it is a bigger responsibility on software architects to choose the appropriate one for a project right at the beginning. NoSQL Databases - NoSQL Databases. Blog of Data » Blog Archive » Benchmarking Riak for the Mozilla Test Pilot Project. Introduction: Using A Riak Cluster for the Mozilla Test Pilot Project As part of integrating Test Pilot into the Firefox 4.0 beta, we needed a production-worthy back-end for storing the experiment results and performing analysis on them.

Blog of Data » Blog Archive » Benchmarking Riak for the Mozilla Test Pilot Project

As discussed in the previous blog post, Riak and Cassandra and Hbase, oh my! , we decided on Riak as that back-end. Some of the preliminary work and a lot of the initial implementation involved conducting benchmarking studies that would verify the fitness of the solution and give us a solid understanding of when and how we would need to scale. Mozilla worked with Basho (the stewards of the Riak project) to perform this benchmarking, and this blog post details the results. The Test Pilot program involves the storage and processing of usability experiments from users who have opted in to the program. So our median payload size is 25 KB and the max item was 2 MB. ErlangUserConference2009-RustyKlophaus. Latest News: Getting Started With Riak & Python. Riak is one of a handful of non-relational datastores that has experienced some exposure lately, and with good reason.

Latest News: Getting Started With Riak & Python

It's written in Erlang, is Dynamo-inspired and, while technically is a key-value store, functions very well in our experience as a document store (meaning you can store complex data as the value). Reading through the overview is highly recommended. It's also extremely predictable in production, handles node failures well and (here's the really interesting bit) demonstrates linear performance as you add nodes (see these benchmarks). This is impressive because many distributed systems, especially in the NoSQL space, don't behave like this.

Like many other NoSQL options, it uses MapReduce for many types of queries (which you can write in either Javascript for ad hoc queries or Erlang for repetitious, speedy queries). I've spent a lot of time evaluating most of the NoSQL field recently, and I keep coming back to Riak (well, and of course Redis) as the weapon of choice. Consistent hashing. Today I get back into my post series about the Google Technology Stack, with a more detailed look at distributed dictionaries, AKA distributed key-value stores, AKA distributed hash tables.

Consistent hashing

What we’d like to do is store a dictionary of key-value pairs across a cluster of computers, preferably in a way that makes it easy to manipulate the dictionary without having to think about the details of the cluster. The reason we’re interested in distributed dictionaries is because they’re used as input and output to the MapReduce framework for distributed computing. Of course, that’s not the only reason distributed dictionaries are interesting – they’re useful for many other purposes (e.g., distributed caching). But for the purposes of this post, we’ll imagine our distributed dictionaries are being used as the input and output from a MapReduce job. Riak SmartMachine Benchmark: The Technical Details. Webmachine, ErlyDTL and Riak – Part 1.

Pivotal Labs: Talks. Latest News: Getting Started With Riak & Python. ErlangUserConference2009-RustyKlophaus. Why Riak Search Matters... The awesome dudes at Basho released Riak 0.13 and with it their first version of Riak Search yesterday.

Why Riak Search Matters...

This is all kinds of exciting, and I'll tell you why. Riak Search is (way down below) based on Lucene, both the library and the query interface. It mimicks the Solr web API for querying and indexing. Just like you'd expect something coming out of Basho, you can add and remove nodes at any time, scaling up and down as you go. I've seen an introduction on the basics back at Berlin Buzzwords, and it was already shaping up to be nothing but impressive. Travisswicegood.com. Using Innostore with Riak « Gradual Epiphany. Introducing Riak. Riak-js. Blog of Data » Blog Archive » Benchmarking Riak for the Mozilla Test Pilot Project. NYC NoSQL Fall '09: Bryan Fink from Basho Technologies demonstrates the riak web-shaped data storage engine. NoSQL Databases - NoSQL Databases. Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase comparison. San Francisco Riak Meetup (San Francisco, CA. Minute With Riak. Riak's Bitcask - A Log-Structured Hash Table for Fast Key/Value Data.

How would you implement a key-value storage system if you were starting from scratch?

Riak's Bitcask - A Log-Structured Hash Table for Fast Key/Value Data

The approach Basho settled on with Bitcask, their new backend for Riak, is an interesting combination of using RAM to store a hash map of file pointers to values and a log-structured file system for efficient writes. In this excellent Changelog interview, some folks from Basho describe Bitcask in more detail. The essential Bitcask: Keys are stored in memory for fast lookups. All keys must fit in RAM.Writes are append-only, which means writes are strictly sequential and do not require seeking. Eric Brewer (CAP theorem) came up with idea with Bitcask by considering if you have the capacity to keep all keys in memory, which is quite likely on modern systems, you can have a relatively easy to design and implement storage system.

Riak Core: Building Distributed Applications Without Shared State. Saturday, October 02, 2010 - 03:30 PM - 04:00 PM.

Riak Core: Building Distributed Applications Without Shared State

Toying around with Riak for Linked Data. So I stumbled upon Rob Vesse’s tweet the other day, where he said he was about to use MongoDB for storing RDF.

Toying around with Riak for Linked Data

A week earlier I watched a nice video about links and link walking in Riak, “a Dynamo-inspired key/value store that scales predictably and easily” (see also the Wiki doc). Now, I was wondering what it takes to store an RDF graph in Riak using Link headers. Let me say that it was very easy to install Riak and to get started with the HTTP interface. The main issue then was how to map the RDF graph into Riak buckets, objects and keys. Here is what I came up so far – I use a RDF resource-level approach with a special object key that I called:id, which is the RDF resource URI or the bNode. Enough words. Take the following RDF graph (in Turtle): Who uses Riak. Introduction à Riak. Exploring Riak. Riak at Appush, San Francisco NoSQL Meetup. A few weeks ago the San Francisco NoSQL Meetup Group held its first meeting at CBS Interactive in San Francisco with the topic of Riak at Appush presented by Dan Reverri from Appush.

Riak at Appush, San Francisco NoSQL Meetup

I had not previously heard of Riak so before attending the talk I very briefly looked up what it is. I discovered it was a key-value created by Basho and since I had been doing some reading about Redis and Voldemort that was enough information to get me interested. After the talk, thanks to Dan's great presentation, I discovered Riak has a lot more capabilities than just a simple key-value store. Riak is more than a simple single server key-value store, it is distributed, scalable, and supports replication. Unlike Redis, you don't need to implement your own sharding strategy, scalability is built in and automatic.

Implementing Indexes in Riak. Every database has secondary indexes, right? Not quite. It turns out that some databases don’t support them. Secondary indexes are important because they make it possible to perform more, quick, queries on a given chunk of data. Riak: From Design to Deploy: Velocity 2010, Web Performance & Operations Conference - O'Reilly Conferences, June 22 - 24, 2010, Santa Clara, CA. Riak - An Open Source Scalable Data Store. Jeremiah Peschka. Every database has secondary indexes, right? Not quite. It turns out that some databases don’t support them. Secondary indexes are important because they make it possible to perform more, quick, queries on a given chunk of data. What if we want to add secondary indexes to a database, how would we go about doing it?