background preloader

Nosql use-cases

Facebook Twitter

How do LinkedIn's recommendation systems work. Using Apache Hadoop to Find Signal in the Noise: Analyzing Adverse Drug Events. Last month at the Web 2.0 Summit in San Francisco, Cloudera CEO Mike Olson presented some work the Cloudera Data Science Team did to analyze adverse drug events.

Using Apache Hadoop to Find Signal in the Noise: Analyzing Adverse Drug Events

We decided to share more detail about this project because it demonstrates how to use a variety of open-source tools – R, Gephi, and Cloudera’s Distribution Including Apache Hadoop (CDH) – to solve an old problem in a new way. An adverse drug event (ADE) is an unwanted or unintended reaction that results from the normal use of one or more medications. The consequences of ADEs range from mild allergic reactions to death, with one study estimating that 9.7% of adverse drug events lead to permanent disability. Another study showed that each patient who experiences an ADE remains hospitalized for an additional 1-5 days and costs the hospital up to $9,000.

Some adverse drug events are caused by drug interactions, where two or more prescription or over-the-counter (OTC) drugs taken together leads to an unexpected outcome. Fast, easy, realtime metrics using Redis bitmaps « At Spool, we calculate our key metrics in real time.

Fast, easy, realtime metrics using Redis bitmaps «

Traditionally, metrics are performed by a batch job (running hourly, daily, etc.). Redis backed bitmaps allow us to perform such calculations in realtime and are extremely space efficient. In a simulation of 128 million users, a typical metric such as “daily unique users” takes less than 50 ms on a MacBook Pro and only takes 16 MB of memory. Spool doesn’t have 128 million users yet but it’s nice to know our approach will scale. We thought we’d share how we do it, in case other startups find our approach useful.

Crash Course on Bitmap and Redis Bitmaps Bitmap (aka Bitset) A Bitmap or bitset is an array of zeros and ones. Population Count The population count of a Bitmap is the number of bits set to 1. Bitmaps in Redis Redis allows binary keys and binary values. A simple example: Daily Active Users To count unique users that logged in today, we set up a bitmap where each user is identified by an offset value. Optimizations Sample Code. DataSift Architecture: Realtime Datamining at 120,000 Tweets Per Second. I remember the excitement of when Twitter first opened up their firehose.

DataSift Architecture: Realtime Datamining at 120,000 Tweets Per Second

As an early adopter of the Twitter API I could easily imagine some of the cool things you could do with all that data. I also remember the disappointment of learning that in the land of BigData, data has a price, and that price would be too high for little fish like me. It was like learning for the first time there would be no BigData Santa Clause.