background preloader

Storm, distributed and fault-tolerant realtime computation

Storm, distributed and fault-tolerant realtime computation

So, you want to build a recommendation engine? At PredictiveIntent, we had a lot of enquiries from people at companies who were not sure whether to build their own recommendation engine, plug in a lightweight recommendations solution, or dedicate some time to implementing “personalisation” properly. Our advice usually consists of three main points: Focus on your goals – will spending too much time building a recommendation engine take your development cycle off track? The importance of technology – thowing a few lines of Javascript code on a side and manually uploading datafeeds might be sufficient for the time being, but it will restrict you from innovating with recommendations? Don’t underestimate performance – can you support a 99.95% uptime with multiple redundancy systems, 60 millisecond response times, peak loads of >100 transactions per second, and more? However, there are many different variations that fall into two main camps: Recommendations and Personalisation. Recommendations And for most, that’s all. Personalisation

nathanmarz/storm Kafka Prior releases: 0.7.x, 0.8.0. 1. Getting Started 1.1 Introduction Kafka is a distributed, partitioned, replicated commit log service. What does all that mean? First let's review some basic messaging terminology: Kafka maintains feeds of messages in categories called topics. Communication between the clients and the servers is done with a simple, high-performance, language agnostic TCP protocol. Topics and Logs Let's first dive into the high-level abstraction Kafka provides—the topic. A topic is a category or feed name to which messages are published. Each partition is an ordered, immutable sequence of messages that is continually appended to—a commit log. The Kafka cluster retains all published messages—whether or not they have been consumed—for a configurable period of time. In fact the only metadata retained on a per-consumer basis is the position of the consumer in in the log, called the "offset". The partitions in the log serve several purposes. Distribution Producers Consumers Guarantees

Building a recommendation engine, foursquare style Mar 22nd Last summer, foursquare’s employee count had grown a bit beyond our office capacity (as we surged towards 20 employees) and we had people sitting in whatever open space we could find. We were split between floors, parked on folding tables, and crammed into couches and loveseats. In one of those seats, @anoopr was playing around with building a map showing interesting places, which we called “Explore.” After that initial discussion, we quickly set up an API endpoint for Explore and started adding and tweaking features. With the results we were seeing, we could already sense that Explore was going to become something awesome. Our mobile web test client At this point, it was time to build in some personalization into the algorithm. One of the hardest parts of building this was determining what the algorithm should do. While we’re keeping the new “cold start” algorithm as part of our secret sauce, we wanted to give you a closer look into the data that fed the ranking. What’s next?

Welcome to Apache™ Hadoop™! For fast, interactive Hadoop queries, Drill may be the answer — Cloud Computing News JAGS - Just Another Gibbs Sampler Introducing Cascalog: a Clojure-based query language for Hadoop I'm very excited to be releasing Cascalog as open-source today. Cascalog is a Clojure-based query language for Hadoop inspired by Datalog. Highlights Simple - Functions, filters, and aggregators all use the same syntax. OK, let's jump into Cascalog and see what it's all about! Basic queries First, let's start the REPL and load the playground: lein repluser=> (use 'cascalog.playground) (bootstrap) This will import everything we need to run the examples. user=> (? This query can be read as "Find all ? OK, let's try something more involved. user=> (? That's pretty simple too. Let's run that query again but this time include the ages of the people in the results: user=> (? All we had to do was add the ? Let's do another query and find all the male people that Emily follows: user=> (? You may not have noticed, but there's actually a join happening in this query. Structure of a query Let's look at the structure of a query in more detail. user=> (? The query operator we've been using is ? (age ? (< ? (* 4 ?

Related: