background preloader

Apache Kafka

Apache Kafka
Apache Kafka is publish-subscribe messaging rethought as a distributed commit log. Fast A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients. Scalable Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime.

http://kafka.apache.org/

Related:  Service BusStartup Learning

Service Bus Queues, Topics, and Subscriptions Queues offer First In, First Out (FIFO) message delivery to one or more competing consumers. That is, messages are typically expected to be received and processed by the receivers in the temporal order in which they were added to the queue, and each message is received and processed by only one message consumer. A key benefit of using queues is to achieve “temporal decoupling” of application components. Eight hot technologies that were built in Scala With Scala Days 2015 San Francisco just around the corner (and only 15% of tickets left), it has got me thinking quite a bit about how much the ecosystem has expanded since I first became involved with the conference in 2011. The rapidly-growing Scala community has evolved from what was largely a very academic and research-oriented crew, with some early champions like Twitter and Foursquare, to a language that’s become a standard for enterprises, start-ups and universities alike. But even as companies and individuals use Scala to build their own new ideas, they also utilize other excellent tools like Play Framework, Akka, Apache Spark and Kafka...which are not only some of the hottest tools and projects on the market right now, but also intentionally built in Scala (for many reasons…)

22 free tools for data visualization and analysis You may not think you've got much in common with an investigative journalist or an academic medical researcher. But if you're trying to extract useful information from an ever-increasing inflow of data, you'll likely find visualization useful -- whether it's to show patterns or trends with graphics instead of mountains of text, or to try to explain complex issues to a nontechnical audience. There are many tools around to help turn data into graphics, but they can carry hefty price tags. The cost can make sense for professionals whose primary job is to find meaning in mountains of information, but you might not be able to justify such an expense if you or your users only need a graphics application from time to time, or if your budget for new tools is somewhat limited. If one of the higher-priced options is out of your reach, there are a surprising number of highly robust tools for data visualization and analysis that are available at no charge. Data cleaning

A quick message queue benchmark: ActiveMQ, RabbitMQ, HornetQ, QPID, Apollo... - Muriel's Tech Blog Lately I performed a message queue benchmark, comparing several queuing frameworks (RabbitMQ, ActiveMQ…). Those benchmarks are part of a complete study conducted by Adina Mihailescu, and everything was presented at the April 2013 riviera.rb meet-up. You should definitely peek into Adina’s great presentation available online right here. So I wanted to benchmark brokers, using different protocols: I decided to build a little Rails application piloting a binary that was able to enqueue/dequeue items taken from a MySQL database. I considered the following scenarios: Scenario A: Enqueuing 20,000 messages of 1024 bytes each, then dequeuing them afterwards.Scenario B: Enqueuing and dequeuing simultaneously 20,000 messages of 1024 bytes each.Scenario C: Enqueuing and dequeuing simultaneously 200,000 messages of 32 bytes each.Scenario D: Enqueuing and dequeuing simultaneously 200 messages of 32768 bytes each.

Log: Real Time Log Processing I joined LinkedIn about six years ago at a particularly interesting time. We were just beginning to run up against the limits of our monolithic, centralized database and needed to start the transition to a portfolio of specialized distributed systems. This has been an interesting experience: we built, deployed, and run to this day a distributed graph database, a distributed search backend, a Hadoop installation, and a first and second generation key-value store. One of the most useful things I learned in all this was that many of the things we were building had a very simple concept at their heart: the log.

Azure Queues and Service Bus Queues - Compared and Contrasted Updated: March 11, 2015 Authors: Valery Mizonov, Seth Manheim, and Abhishek Lal Contributors: Brad Calder, Jai Haridas, Jason Hogg, Jeff Irwin, Jaganathan Thangavelu, Kartik Paramasivam, Todd Holmquist-Sutherland, and Ruppert Koch Lifting Machine Learning into Akka Streams Introduction In this post we focus on how to integrate machine learning (ML) components (e.g. decision trees, Bayesian networks, SVMs, etc.) into the Muvr application. In particular, we focus on how events arriving from multiple sensors (positioned on an exercising individual) are transformed into classification events (for user feedback). At this stage of the Muvr development, we are unsure as to which ML components will be effective for generating classification events, and so design a general solution where by: segmented events (from multiple sensors) are streamed through a collection of ML classifiers to generate enriched event streams the enriched event streams are continuously monitored for recognisable patterns any recognisable pattern is passed through a decision engine that generates the final classification/notification events for user feedback.

Big Data Is As Misunderstood As Twitter Was Back In 2008 Boonsri Dickinson, Business Insider In 2008, when Howard Lindzon started StockTwits, no one knew what Twitter was. Obviously, that has changed. de rant: Message Queue Shootout! I’ve spent an interesting week evaluating various Message Queue products. The motivation behind this is a client that has somewhat high performance requirements. They have bursts of over a million simultaneous messages. Currently they’re using a SQL server based solution, but it’s not ideal, and I’m suggesting they look at Message Queuing products as an alternative. In order to get a completely unscientific feel for the performance of some likely contenders, I put together a little test. Each queue would be asked to send one million 1K messages and receive them again.

Readings in distributed systems This post is a work in progress. Inspired by a recent purchase of the Red Book, which provides a curated list of important papers around database systems, I’ve decided to begin assembling a list of important papers in distributed systems. Similar to the Red Book, I’ve broken each group of papers out into a series of categories, each highlighting a progression of related ideas over time focused in a specific area of research within the field. Keeping the tradition of the Red Book, I’ve included both papers which resulted in very successful systems and/or techniques, as well as papers which introduced a concept which was either immediately dismissed or proven incorrect.

Related:  distributed systems