background preloader

Couchdb

Facebook Twitter

Ruby, Rack and CouchDB. Over the weekend, I spent some time working on a Ruby + Rack +CouchDB project. Three technologies that I know quite well but that I never put to work together at the same time, at least not directly. Let’s call this Part I. Before we get started, let me introduce each component: Ruby : if you are reading this blog, you more than likely know at least a little bit about, what I consider, one of the most enjoyable programming language out there. It’s also a very flexible language that lets us do some interesting things. I could have chosen Python to do the same project but that’s a whole different topic. For this project we will do something Ruby excels at: reopening existing classes and injecting more code.Rack: a webserver interface written in Ruby and inspired by Python’s WSGI. Goal: Log Couch requests and analyze data Let’s say we have a Rails, Sinatra or Merb application and we are using CouchRest (maybe we are using CouchRest and ActiveRecord, but let’s ignore that for now).

W00t! Why Key-Value stores are like C (and why you might want to use o. What I like about the NoSQL crowd. Although I am not a big fan of the NoSQL movement (mostly because many of its advocates use arguments I do not agree with) there are a few things that I like about the NoSQL crowd and I want to write them down*. Most of what follows stems from discussions through the years with our DBA and some friends who are members of the “Greek Database Mafia“.

For more than two decades the dominance of the relational model (even though no commercial system fully implemented it†) was undisputed. Nobody ever got fired for choosing a commercial RDBMS for an application, where instead one would look suspicious if one dared to propose something different. This situation is no different than what Rob Pike described in “Systems Software Research is Irrelevant” for Operating Systems: For example it took 10+ years for the R-Tree to enter the commercial systems, although it was solving a real problem. You could say that databases outside academic research had come to a halt. You don’t believe me? Like this: Riak - A Decentralized Database. NoSQL and the Relational Model: don’t throw the baby out with th. There’s been a lot of buzz of late about the trend away from SQL and towards distributed databases; it seems the current crop of Relational Databases are increasingly being rejected in certain use cases, sometimes quite fairly.

It would be easy to come away from this criticism with the impression that the Relational Model is equally antiquated, not the right model, and no longer worth developing a full understanding of when thinking about data. This (pardon me) is wrong , but first let’s summarize why CouchDB and companions are taking over some of the turf of popular RDBMSes when it comes to scaling out. Problems scaling with RDBMSes Scaling often requires you to drop down from the abstracted level of a normalised SQL schema and heavily optimise the physical representation of data. There’s also the fact that current RDBMSes tend to be built around stronger transactional models which aren’t compatible with some of the fault-tolerant distributed consensus cleverness that’s recently in vogue. My Thoughts on NoSQL.

Over the past few years, relational databases have fallen out of favor for a number of influential people in our industry. I'd like to weigh in on that, but before doing so, I'd like to give my executive summary of the events leading up to this movement: In the late nineties and early thousands, websites were mostly read-only--a publisher would create some content and users would consume that content. The data access patterns for these types of applications became very well-understood, and as a result many tools were created and much research and development was done to further develop these technologies. As the web has grown more social, however, more and more it's the people themselves who have become the publishers. And with that fundamental shift away from read-heavy architectures to read/write and write-heavy architectures, a lot of the way that we think about storing and retrieving data needed to change.

I love SQL. But none of this really matters. It also has atomic operations. CouchDB naked. Use anonymous type to create JSON objects. I’ve been playing around with CouchDB a bit today and in particular making use of SharpCouch, a library which acts as a wrapper around CouchDB calls. It is included in the CouchBrowse library which is recommended as a good starting point for interacting with CouchDB from C# code. I decided to work out how the API worked with by writing an integration test to save a document to the database.

The API is reasonably easy to understand and I ended up with the following test: In theory that should save the JSON object { key = “value” } to the database but it actually throws a 500 internal error in SharpCouch.cs: Debugging into that line the Status property is set to ‘Protocol Error’ and a bit of Googling led me to think that I probably had a malformed client request. I tried the same test but this time created the document to save by creating an anonymous type and then converted it to a JSON object using the LitJSON library: Scouchdb gets View Server in Scala. CouchDB views are the real wings of the datastore that goes into every document and pulls out data exactly what you have asked for through your queries.

The queries are different from the ones you do in an RDBMS using SQL - here you have all the state-of-the-art map/reduce being exercised through each of the cores that your server may have. One very good part of views in CouchDB is that the view server is a separate abstraction from the data store. Computation of views is delegated to an external server process that communicates with the main process over standard input/output using a simple line-based protocol. You can find more details about this protocol in the couchdb wiki. The default implementation of the query server in CouchDB uses Javascript running via Mozilla SpiderMonkey. Scouchdb gives one for Scala. Couch(test view( Views builder("power/power_lunch") build))and Setting up the View Server The view server is an external program which will communicate with the CouchDB server. Why I don't use CouchDB. CouchDB is lots of fun. It's really easy to install on a mac using the CouchDBX package.

It comes with a nice web UI so you can play around with it straight away. It leverages REST and JSON to provide a simple API that you can use from virtually any language. It has a great transactional model which lets you have full ACID semantics in a very lightweight way. So why don't I use it? Views Unfortunately you can only create a view from the original data, there is no way to create views whose input is other views. This means that you have to live with the same limitations of SQL queries (the fact that they are non recursive, so they can't express transitive relationships), but you don't get the freedom to write queries ad hoc and have them execute efficiently (ad hoc views are supported, but there are no general purposes indexes). The overview implies that CouchDB contains a port of Google's MapReduce framework, but the "real" MapReduce is much more flexible than CouchDB's implementation. CouchDB and Scala - Updates on scouchdb.

A couple of posts back, I introducedscouchdb, the Scala driver for CouchDB persistence. The primary goal of the framework is to offer non-intrusiveness in persistence, in the sense that the Scala objects can be absolutely oblivious to the underlying CouchDB existence. The last post discussed how Scala objects can be added, updated or deleted from CouchDB with the underlying JSON representation carefully veneered away from client APIs. Here is an example of the fetch API in scouchdb .. val sh = couch(test by_id(s_id, classOf[Shop]))The document is fetched as an instance of the Scala class Shop, which can then be manipulated using usual Scala machinery. The return type is a Tuple3, where the first two components are the id and revision that may be useful for doing future updates of the document, while sh._3 is the object retrieved from the data store. Returning tuples from a method is a typical Scala idiom that can give rise to some nice pattern matching code capsules ..

Temporary Views. Scouchdb - Google Code. Scouchdb has moved to Github : 0.4.1 is the latest here. There have been some incompatible changes in 0.5, which is hosted on Github scouchdb offers a Scala interface to using CouchDB. Scala offers objects and classes as the natural way to abstract entities, while CouchDB stores artifacts as JSON documents. scouchdb makes it easy to use the object interface of Scala for persistence and management of Scala objects as JSON documents. Motivation The primary motivation for making scouchdb is to offer a form of CouchDB driver to manipulate objects in a completely non-intrusive manner. The Scala objects are not CouchDB aware and remain completely transparent of any CouchDB dependency. Sample Session Suppose I have a Scala class used to record item prices in various stores .. case class ItemPrice(store: String, item: String, price: Number) Here is a sample session that does this for a local CouchDB server running on localhost and port 5984 .. and.

Framework Inertia, CouchDB and the case of the missing R. There is nothing wrong in using frameworks, so long I can justify the cause. A language like Java leaves a lot for frameworks to implement. While Java offers idioms that promote separation of interfaces from implementation, yet we need dependency injection frameworks to get around the problems of wiring concrete implementations within the application. This is an encouraged practice in the Java world that leads to codebase which is more flexible and unit testable. Nothing wrong with it .. However, the real problem with using frameworks is the inertia which your particularly favorite framework brings on to you. A classic example is the inertia that dependency injection frameworks bring upon us. But I digress .. Object Relational Mapping is a layer of abstraction that is supposed to abstract away the impedance mismatch between your object oriented domain model and the relational persistence model.

Here is an example custom view from the CouchRest distribution .. And the Associations .. CouchDB and Me. Coderspiel. Relax with CouchDB. Why CouchDB? Why CouchDB Sucks. CouchDB really sucks at doing some things. That should come as no surprise, as every technology has its advantages and its drawbacks. The thing is, when a new technology comes out that looks really promising and cool, everyone writes about all of its advantages, and none of its drawbacks. Then, people start to use it for things it isn't very good at, and they are disappointed. In that spirit, I would like to talk about some of the things that (in my experience) CouchDB is absolutely not good at, and that you shouldn't try to use it for.

First, it doesn't support transactions in the way that most people typically think about them. That means, enforcing uniqueness of one field across all documents is not safe. A classic example of this would be enforcing that a username is unique. Another consequence of CouchDB's inability to support the typical notion of a transaction is that things like inc/decrementing a value and saving it back are also dangerous. So does CouchDB suck? Standalone Applications with CouchDB. CouchDB Implementation. CouchDB is an Apache OpenSource project. It is Damien Katz's brain child and has a number of very attractive features based on very cool technologies.

Such as ...RESTful APISchema-less document store (document in JSON format)Multi-Version-Concurrency-Control modelUser-defined query structured as map/reduce Incremental Index Update mechanismMulti-Master Replication modelWritten in Erlang (Erlang is good) There is a wide range of application scenarios where CouchDB can be a good solution fit, from an occasionally connected laptop-based application, high performance data cluster, and all the way up to virtual data storage in the cloud.

To understand deeper about CouchDB design, I am very fortunate to have a conversation with Damien, who is so kind to share many details with me. Here I want to capture what I have learnt from this conversation. Underlying Storage StructureCouchDB is a “document-oriented” database where document is a JSON string (with an optional binary attachment). E.g. Thoughts about CouchDB. Couchdb Wiki. CouchDB: The CouchDB Project.