Technology

TwitterFacebook
Get flash to fully experience Pearltrees
Tablet Devices

Transistors

For online auction powerhouse eBay, big data is serious business. The company has 100 million active users globally, 300 million live listings at any time (and it archives them all), receives 2 billion page views daily, and handles 250 million search queries and 75 billion database calls a day. How does eBay make sense of all this activity? http://gigaom.com/cloud/under-the-covers-of-ebays-big-data-operation/

Under the covers of eBay’s big data operation — Cloud Computing News

Payments

API

Analytics

POS

ParkFree

http://gigaom.com/2010/07/12/nosql-pioneers-are-driving-the-webs-manifest-destiny/

NoSQL Pioneers Are Driving the Web's Manifest Destiny — Tech News and Analysis

Twitter has scaled back its plans to store billions of tweets using Cassandra, a non-relational database project that Facebook created and open sourced. Friday night, Twitter said that it will still use Cassandra in a new real-time analytics project it is building, but the decision to move away from plans to migrate tweets from its current MySQL database to Cassandra is seen by some as a blow to startups and open-source projects that are attempting to move beyond relational databases. But in reality, the level of interest about what database architecture some popular startup is using goes beyond Twitter and Cassandra, and touches on the changing nature of both the web and the software that underlies it. In short, the story here isn’t about Cassandra or databases themselves, but about groups of pioneering programmers reacting to the new ways they can build software in a world where computing is cheap.
http://css.dzone.com/articles/cassandra-nosql-database After Facebook made the Cassandra project open source in 2008, the highly scalable, non-relational distributed database proved its mettle at other companies including Cisco, Digg, and Rackspace. Now the well-known NoSQL database has proven itself as an active Apache project with enough training to graduate from the incubator. The board recently approved a resolution to adopt Cassandra as a Top Level Project. The new "Apache Cassandra" should serve as another example of a high-profile, non-relational data solution and success story

Cassandra NoSQL Database an Apache Top Level Project | Web Builder Zone

Saying Yes to NoSQL; Going Steady with Cassandra | Digg About

http://about.digg.com/blog/saying-yes-nosql-going-steady-cassandra The last six months have been exciting for Digg's engineering team. We're working on a soup-to-nuts rewrite. Not only are we rewriting all our application code, but we're also rolling out a new client and server architecture. And if that doesn't sound like a big enough challenge, we're replacing most of our infrastructure components and moving away from LAMP. Perhaps our most significant infrastructure change is abandoning MySQL in favor of a NoSQL alternative. To someone like me who's been building systems almost exclusively on relational databases for almost 20 years, this feels like a bold move.
http://nosql.mypopescu.com/post/407159447/cassandra-twitter-an-interview-with-ryan-king There have been confirmed rumors about Twitter planning to use Cassandra for a long time. But except the mentioned post, I couldn’t find any other references. Twitter is fun by itself and we all know that NoSQL projects love Twitter .

Cassandra @ Twitter: An Interview with Ryan King • myNoSQL

How Twitter Uses NoSQL - ReadWriteCloud

InfoQ has released a video of Twitter 's Kevin Weil speaking at Strange Loop earlier this year on how the company uses NoSQL. Weil is quick to point out that Twitter is heavily dependent on MySQL. However, Twitter does employ NoSQL solutions for many purposes for which MySQL isn't ideal. According to Weil, Twitter users generate 12 terrabytes of data a day - about four petabytes per year. And that amount is multiplying every year. Read on for our notes on Weil's talk. http://www.readwriteweb.com/cloud/2011/01/how-twitter-uses-nosql.php

NSA open sources Google database mimic • The Register

http://www.theregister.co.uk/2011/09/06/nsa_to_open_source_google_bigtable_like_database/ The US National Security Agency is open sourcing a distributed "NoSQL" database based on Google's proprietary BigTable platform. Known as Accumulo, the platform has been in development at the NSA for over three years, and it's built atop Hadoop, the open source distributed file system and distributed number-crunching platform that mimics Google's internal infrastructure. Unlike existing BigTable mimics such as HBase , Accumolo has "fine-grained" access controls and a new server-side programming mechanism that can modify data that's written to disk, or returned to the user. Using the cell-level access labels, you can provide external servers with access to some cells in the Accumolo data store but not others. The NSA believes this may be of interest to government and health care operations and other outfits concerned with privacy.
http://glinden.blogspot.com/2006/12/talk-on-ebay-architecture.html

Talk on eBay architecture

Randy Shoup and Dan Pritchett gave a talk on scaling eBay, "The eBay Architecture", at SD Forum 2006. The slides are available ( PDF ). The parallels with Amazon are remarkable. Like Amazon, eBay started with a two-tiered architecture.
Data Hegemony

This is a wonderfully informative Amazon update based on Joachim Rohde's discovery of an interview with Amazon's CTO. You'll learn about how Amazon organizes their teams around services, the CAP theorem of building scalable systems, how they deploy software, and a lot more. Many new additions from the ACM Queue article have also been included. Amazon grew from a tiny online bookstore to one of the largest stores on earth.

High Scalability - High Scalability - Amazon Architecture

http://highscalability.com/amazon-architecture
http://money.howstuffworks.com/amazon1.htm

Amazon Technology"

The massive technology core that keeps Amazon running is entirely Linux-based . As of 2005, Amazon has the world's three largest Linux databases, with a total capacity of 7.8 terabytes (TB), 18.5 TB and 24.7 TB respectively [ ref ]. The central Amazon data warehouse is made up of 28 Hewlett Packard servers, with four CPUs per node, running Oracle 9i database software. The data warehouse is roughly divided into three functions: query , historical data and ETL ( extract, transform, and load -- a primary database function that pulls data from one source and integrates it into another). The query servers (24.7 TB capacity) contain 15 TB of raw data in 2005; the click history servers (18.5 TB capacity) hold 14 TB of raw data; and the ETL cluster (7.8 TB capacity) contains 5 TB of raw data.
Twitter is arguably the most heavily used Ruby on Rails application in the world. Almost since its inception, Twitter has fostered a wildly passionate cult following. Also from the beginning, Twitter has suffered from chronic outages under that load. In the past month, record downtime has prompted fresh outcry within its ever-growing user base.

Did Rails Sink Twitter? » SitePoint