background preloader

Big data

Facebook Twitter

Why Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management - This post originally appeared in the developer tech blog on June 10, 2014.

Why Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management -

If you’re in the business of cloud-based log management, every aspect of your service needs to be designed for reliability and scale. Here’s what Loggly faces, daily: A massive stream of incoming events with bursts reaching 100,000+ events per second and lasting several hoursThe need for a “no log left behind” policy: Every log has the potential to be the critical one, and our customers can’t afford for us to drop a single oneOperational troubleshooting use cases that demand near real-time indexing and time series index management At Loggly, our growth has been both amazing and challenging.

OpenStack Juno Design Summit outcomes for Keystone. This is a summary of the discussions, design decisions, goals, and direction that came out of the OpenStack Juno Design Summit in Atlanta (spring 2014) with regard to Keystone.

OpenStack Juno Design Summit outcomes for Keystone

Consider this to be a sequel to my similar coverage of the Icehouse summit. (This is Juno, Georgia. Building a Recommendation Engine. One of the biggest problems facing Assembly users has been navigating the wide variety of activity for what interests them the most.

Building a Recommendation Engine

With such a diverse community of developers, designers, copywriters, and virtually every niche, and with so many different ongoing projects, finding the right fit can be difficult for a new user. Knowing how pertinent a particular product is requires delving into its guts in a process of trial-and-error. Even for the most compelling products, perusing the list of available bounties is an intensive, hit-or-miss process.

The copywriter who wants to help out with Coderwall must search through countless ‘backend’ and ‘ruby’ bounties before finding what they want. Other users may not know exactly what they want yet. Assembly has built a new suggestion system to help address these issues. An object, whether it’s a user, product, or bounty, can be attached to multiple ‘marks’. Josephmisiti/awesome-machine-learning. 1 Platform: Trusted Application Development Platform. Amazon Kinesis. We're running 0.7 and most of our problems have been around partition rebalancing.

Amazon Kinesis

I'm not the primary engineer on this, but here's my understanding: If we add nodes to an existing Kafka cluster, those nodes own no partitions and therefore send/receive no traffic. A rebalancing event must occur for these servers to become active. Bouncing Kafka on one of the active nodes is one way to trigger such an event. Fortunately, cluster resizing is infrequent.

When ZooKeeper detects a node failure (however brief), the node is removed from the active pool and the partitions are rebalanced. As a result, we have to bounce Kafka on an active server every few weeks in response to network blips. 0.8 alleges to handle this better, but we'll see. (3) RabbitMQ: RabbitMQ vs Kafka: which one for durable messaging with good query features?

Is AWS Kinesis is just SQS on steroids? Kafka + Storm = Realtime Data at GumGum. By Vaibhav Puranik, Prinicpal Engineer Usually our CTO Ken Weiner never refuses to go to the usual GumGum afternoon coffee walk at 3:30.

Kafka + Storm = Realtime Data at GumGum

But on that day, something was fishy. He refused. When I came back I saw him salivating at the screen. I walked to his desk and he quickly pointed me to a website on his screen. Being an engineer at heart, I got overjoyed with the prospects of doing challenging work. Developed by engineers at LinkedIn, the makers of Kafka believed that sequential disk access can be sometimes faster than RAM! Kafka uses ZooKeeper for coordination. That’s what we decided to do at GumGum. At consumer side the coordination is important and hence the zookeeper consumer base is a better choice unless you have a single consumer or your own coordination system.

Kafka created multipe streams of events. Meanwhile, we had started hearing about Storm from friends and the big data community. Storm is especially designed for processing unending streams. Дайджест интересных новостей и материалов из мира PHP № 42 (1 — 16 июня 2014) Сегодня в 19:47 Предлагаем вашему вниманию очередную подборку со ссылками на новости и материалы.

Дайджест интересных новостей и материалов из мира PHP № 42 (1 — 16 июня 2014)

Приятного чтения! Новости и релизы. A Programmer's Guide to Data Mining. Deep Learning 101. Deep learning has become something of a buzzword in recent years with the explosion of 'big data', 'data science', and their derivatives mentioned in the media.

Deep Learning 101

Justifiably, deep learning approaches have recently blown other state-of-the-art machine learning methods out of the water for standardized problems such as the MNIST handwritten digits dataset. My goal is to give you a layman understanding of what deep learning actually is so you can follow some of my thesis research this year as well as mentally filter out news articles that sensationalize these buzzwords. (source) Imagine you are trying to recognize someone's handwriting - whether they drew a '7' or a '9'. From years of seeing handwritten digits, you automatically notice the vertical line with a horizontal top section. Current machine learning algorithms' performance depends heavily on the particular features of the data chosen as inputs. Big Data. Cloud Storage.

NoSQL