background preloader

Apache Kafka

Apache Kafka

Apache ActiveMQ ™ -- Index Log: Real Time Log Processing I joined LinkedIn about six years ago at a particularly interesting time. We were just beginning to run up against the limits of our monolithic, centralized database and needed to start the transition to a portfolio of specialized distributed systems. This has been an interesting experience: we built, deployed, and run to this day a distributed graph database, a distributed search backend, a Hadoop installation, and a first and second generation key-value store. One of the most useful things I learned in all this was that many of the things we were building had a very simple concept at their heart: the log. You can't fully understand databases, NoSQL stores, key value stores, replication, paxos, hadoop, version control, or almost any software system without understanding logs; and yet, most software engineers are not familiar with them. Part One: What Is a Log? A log is perhaps the simplest possible storage abstraction. So, a log is not all that different from a file or a table. The End

Apache Gora™ Apache Qpid™: Open Source AMQP Messaging RabbitMQ - Messaging that just works Apache ZooKeeper - Home A quick message queue benchmark: ActiveMQ, RabbitMQ, HornetQ, QPID, Apollo... - Muriel's Tech Blog Lately I performed a message queue benchmark, comparing several queuing frameworks (RabbitMQ, ActiveMQ…). Those benchmarks are part of a complete study conducted by Adina Mihailescu, and everything was presented at the April 2013 riviera.rb meet-up. You should definitely peek into Adina’s great presentation available online right here. So I wanted to benchmark brokers, using different protocols: I decided to build a little Rails application piloting a binary that was able to enqueue/dequeue items taken from a MySQL database. I considered the following scenarios: Scenario A: Enqueuing 20,000 messages of 1024 bytes each, then dequeuing them afterwards.Scenario B: Enqueuing and dequeuing simultaneously 20,000 messages of 1024 bytes each.Scenario C: Enqueuing and dequeuing simultaneously 200,000 messages of 32 bytes each.Scenario D: Enqueuing and dequeuing simultaneously 200 messages of 32768 bytes each. I decided to bench the following brokers: Scenario A Scenario B Scenario C Scenario D

Readings in distributed systems This post is a work in progress. Inspired by a recent purchase of the Red Book, which provides a curated list of important papers around database systems, I’ve decided to begin assembling a list of important papers in distributed systems. Similar to the Red Book, I’ve broken each group of papers out into a series of categories, each highlighting a progression of related ideas over time focused in a specific area of research within the field. Keeping the tradition of the Red Book, I’ve included both papers which resulted in very successful systems and/or techniques, as well as papers which introduced a concept which was either immediately dismissed or proven incorrect. Consensus The problems of establishing consensus in a distributed system. Consistency Types of consistency, and practical solutions to solving ensuring atomic operations across a set of replicas. Conflict-free data structures Studies on data structures which do not require coordination to ensure convergence to the correct value.

Apache Sirona - Apache Sirona de rant: Message Queue Shootout! I’ve spent an interesting week evaluating various Message Queue products. The motivation behind this is a client that has somewhat high performance requirements. They have bursts of over a million simultaneous messages. Currently they’re using a SQL server based solution, but it’s not ideal, and I’m suggesting they look at Message Queuing products as an alternative. In order to get a completely unscientific feel for the performance of some likely contenders, I put together a little test. Each queue would be asked to send one million 1K messages and receive them again. The candidates are: MSMQ. Getting all four MQ products up and running was fun. ZeroMQ, with its brokerless architecture doesn’t require any server process or runtime. So without further chit-chat, here are the results. As you can see, there’s ZeroMQ and the others. To be honest, I was hoping for more from Rabbit. If you’d like to run the tests for yourself, my test code is on GitHub here.