Choosing a non-relational database; why we migrated from MySQL to MongoDB

Simple Service Bus / Message Queue with MongoDB | CaptainCodeman A service bus or message queue allow producers and subscribers to communicate asynchronously so that a system can handle disconnects, processes being stopped and started or enable peaks of demand to be handled beyond what the subscriber can immediately cope with. The queue acts as a buffer that the producer writes to and the subscriber reads from. There are lots of implementations such as NServiceBus, MassTransit, Rhino Service Bus and the cloud-provided services such as Amazon’s Simple Queue Service and Window Azure’s AppFabric Service Bus. Often, all that is needed is something fairly simple to buffer messages between processes and persist them. Now, I did have a look round first to see if anyone else had created something already and the closest I got was the post here: Why (and How) I Replaced Amazon SQS with MongoDB. Why are these features important? We don’t want the queue to just grow and grow and grow but would like to put a cap on the size. Query Flags: AwaitData TailableCursor

MySQL vs. MongoDB: Looking At Relational and Non-Relational Databases | Neon Rain Interactive When building a custom web application you need to consider the type of database that best suits the data. Here's a quick guide on the differences between MySQL (Relational) and MongoDB (Non-Relational / NoSQL). It was back in 2004 that Ruby on Rails first came out and popularized web application frameworks. What you might not know, is that it also popularized ORM (Object-Relational Mapping) layers with its ActiveRecord object. For example, if you have a "post" object that represents a blog post, you can access it's comments through the property "post.comments". Thankfully, we never jumped on to the ORM bandwagon. Data Representation MySQL represents data in tables and rows. MongoDB represents data as collections of JSON documents. If you think about it, a JSON document is very much like what you would be working with in your application layer. Querying The SQL in MySQL stands for Structured Query Language. MongoDB uses object querying. Relationships Transactions Schema Definition Performance

MongoDB as a Message Queue This is a live blog from MongoSV. Here’s a link to the entire series of posts.About.me uses MongoDB for different pieces of infrastructure, but this talk is just about queuing. Originally ran a 3-node RabbitMQ cluster, without disk persistence. Benefits: async ops, per-message (document) atomicity, batch processing, periodic processing, durability, sharding, operational familiarity (n.b. that would be the big one for me!). Use a capped collection? Implementation: Each message is a document. To consume a message they use a findAndModify to grab and remove a document atomically. That’s pretty much it! Benchmarks they ran showed MongoDB outperforming RabbitMQ for message creation by 19% (this is a single-node benchmark on a laptop, FYI). FindAndModify is blocking, so you will see high lock % w/ lots of concurrent consumers. Pros and Cons Pro: familiar, sharding, durability/persistence, low operational overhead, optional use of advanced queries.

MapReduce-MPI Library mongoDB で部分一致検索を速くしたい Rails3 で mongoid を使って開発をすることが多くなってきた今日このごろです．レコード数は大変に多いので，テスト環境ではまともな速度が出るのに，いざ実際にとなると全くダメ，返事がかえってこない，ただのしかばねのようなシステムが出来上がったりします．たとえば部分一致検索． aaa aba bba ccc とか，たくさんの列が並んでいるときに，「a」が入っているものを検索したい，とか考えます．mongoDB では Perl みたいな正規表現が使えるので，ワイルドカードなんかを駆使すると，こんなふうに書けます /. 具体的にはこんなのをモデルに書きます．ピリオド(.) そこで，インデックスを作成するための表を作ってみようと思います．レコードに「abc」という文字列があったときに，その索引として「bc」「c」も記録しておきます．具体的には abc, abc bc, abc c, abc そして，1 列目に index を張っておきます．ここで重要なのは，全部の部分文字列を記録しておくわけではない，ということです．検索するときには，ワイルドカードは使いません．使うと遅くなるからです．先頭に探したい文字があるかどうか，だけを見ます．こんなふうになります． /^探したい文字列/ つまり，探したい文字列が先頭にあるかどうかを探します．ここで /^探したい文字列.*/ と書いても良いじゃないかと思うのですが，というか良いのですが上のものと比べると若干遅くなります．こんなのをモデルに書きます．そして，2列目の文字列を取り出すと検索の出来上がりです．こうすると，大量のデータがあってもそれなりに高速な検索ができます．ちなみに，この考え方は SuffixArray といって，改良版もたくさんあるので，もっと高速化できそうではあるのですが，まぁこのあたりで．ちなみに，このインデックスを作成する表を sharding で分割すると，より速くなります．もちろん 1 列目をキーにして分割しないと意味無いですが．

nosql - Non-Relational Database Design ～うまく動かすMongoDB～仕組みや挙動を理解する - @doryokujin ブログ：勉強会はうちに帰っても勉強会 @doryokujinです。この業界で非常に強い影響力を持つ@kuwa_tw氏が某勉強会でMongoDBについてdisられており、このままではMongoDB自身の存続が危ういと思い、急遽ブログ書きました。（冗談ですよ） MongoDBを使っているときに出会うトラブルをうまくまとめてくださった「MongoDBあるある」的な良い資料だと思います。今日はここで書かれているトラブルの解決方法を提示したいと思います。 MongoDB はデータベースもコレクションも存在しなければ自動作成してくれる mongoシェルを起動する場合、たいていは $ mongo のようにして現在稼働しているmongodのホスト名とポート名を指定してmongoシェル起動するかと思います。さらにMongoDBは存在しないデータベースに対しても処理を行うことができます。 MongoDBは非常におせっかいなDBなのでこちらの怠惰な操作に関しても自動で色々と面倒を見てくれるのです。・mongo 注意してください。 Replication時の親切な自動リカバリ機能 MongoDBのReplicationは非常にステキな機能を備えています。 MongoDBのReplica SetはSetメンバーを他のReplica Setと区別をつけるためのちなみに異なるホスト名やポート名でmongodを起動した場合はこの自動リカバリ機能は発動しません。チャンクが移動しまくる・デフォルトのチャンクが大きすぎる MongoDBにはShardingとよばれるデータを複数のサーバーに分割する機能を備えています。 MongoDBのShardingは2つのおせっかいな便利な自動機能をもっています。自動Sharding 例えばShard Keyに"name"を指定したとします。実はデフォルトのChunkサイズは200MBなのですが、Sharding開始時では64MBに下げられています。 mongos --port 10000 --configdb host1:10001 --chunkSize 500 ここで --chunkSizeを1[MB]に設定してやるとchunkはデフォルトよりも遙かに速いペースで分割されていきます。自動Balancing Sharding の種々の問題しかしこれらの機能が完璧に働いてくれることを期待してはいけません。メモリの状況を確認するまとめ

Map-Reduce — MongoDB Manual 2.6.4 Map-reduce is a data processing paradigm for condensing large volumes of data into useful aggregated results. For map-reduce operations, MongoDB provides the mapReduce database command. Consider the following map-reduce operation: In this map-reduce operation, MongoDB applies the map phase to each input document (i.e. the documents in the collection that match the query condition). The map function emits key-value pairs. All map-reduce functions in MongoDB are JavaScript and run within the mongod process. Note For most aggregation operations, the Aggregation Pipeline provides better performance and more coherent interface. Map-Reduce JavaScript Functions In MongoDB, map-reduce operations use custom JavaScript functions to map, or associate, values to a key. The use of custom JavaScript functions provide flexibility to map-reduce operations. Map-Reduce Behavior In MongoDB, the map-reduce operation can write results to a collection or return the results inline.

NoSQLをRDBの代わりに使うと、どういう恐ろしいことが起こるか。PARTAKEの作者が語る 2010年12月21日データベースの世界でいま注目されているのがNoSQL。特にキーバリュー型データストアは、グーグルのBigTable、FacebookやTwitterが内部で利用しているCassandraやAmazonクラウドが提供しているSimpleDBなど、すでに実際に使われ始めています。ではそのNoSQLをリレーショナルデータベースの代わりに使ってシステムを構築するとどうなるのか？ NoSQLを用いたシステム構築は、リレーショナルデータベースによる構築どう違うのか？ NoSQLを使ったときに起こる恐ろしい事例プレゼンテーションのテーマは「NoSQLをRDBの代わりに使うと、どういう恐ろしいことが起こるかを身をもって示す」具体的には、イベントを告知したり参加者を募って申し込みができる開催支援ツール「PARTAKE」を、NoSQLをベースにして開発した際の体験。バックエンドはApache Cassandra 0.6系を用い、リレーショナルデータベースは使用せず。まずは1つめの恐ろしい事例。「NoSQLはスキーマレスだから、データをリレーショナルデータベースより柔軟に扱えるよね？」「ハァ？」例えば、検索対象を増やしたい場合。リレーショナルデータベースでは、その列にインデックスを設定すればオーケー。 Cassandraでは、新しいKeyの追加が必要。恐ろしい事例2つ目。 2つ以上のキーを同時に更新したい場合。 CassandraはCommitもRollbackも当然ないため、一貫性がくずれてしまう。恐ろしい事例3つ目は、数さえ数えられないCassandraさん。複数のクライアントからほぼ同時にカウントアップの処理がきたとき、Cassandra内できちんと整合性を保証することは難しい。ただし0.8系でやっと数が数えられるというウワサも。まとめ。ただし、それでもPARTAKEは動いている。作者にコメントをもらいましたこのプレゼンテーションの作者であり、PARTAKEの開発者でもあるShinya Kawanaka氏に、NoSQLについてのコメントをもらいました。 ―― NoSQLによる恐ろしい事例を紹介されていたが、こうした制限についてどう受け止めているか？ ―― こうした恐ろしい事例は知っていたのか？ ―― 実際にNoSQLを使って実装したPARTAKEについて。