background preloader

Scalability

Facebook Twitter

Facebook trapped in MySQL ‘fate worse than death’ — Cloud Computing News. MongoDB is Web Scale. Why are Facebook, Digg, an. Real-time social graphs (connectivity between people, places, and things).

Why are Facebook, Digg, an

That's why scaling Facebook is hard says Jeff Rothschild, Vice President of Technology at Facebook. Social networking sites like Facebook, Digg, and Twitter are simply harder than traditional websites to scale. Why is that? Why would social networking sites be any more difficult to scale than traditional web sites? Bytepawn - Scalable Web Architectures and Application State. In this article we follow a hypothetical programmer, Damian, on his quest to make his web application scalable.

Bytepawn - Scalable Web Architectures and Application State

Now fast forward to 2009. Damian's site has evolved to a web game for playing dungeons online, in your browser. Damian is still using LAMP. Data about types of games (including parameters such as monster strength), user data (including status information), and data about active games (including players, the monters's health) are still stored in Mysql. It's all fine until there are only a few games in session and only a couple hundred players, but as the site gets popular, Damian's server is starting to see high load numbers. Damian is experiencing scalability issues --- his current setup cannot handle tens of thousands of users. Against all the odds. First let's describe what means by odds: In my social network, I found 93% of the mainstream developers sanctify the database, or at least consider it in any data persistence challenge as the ultimate, superhero, and undefeatable solution. I think this problem come from the education, personally, and some companies also I think it's involved in this.

To start to fix this bad thinking, we all should agree in the following points: Every challenge have its own solutions, so whatever you want to save/persistent, there are always many solutions. I hope if you agreed with me in the previous points. So the question do we really need Database in every application? Are Cloud Based Memory Architectures the Next Big Thing? » Scalable Web Applications Programming the new world: Programmi. Purpose of the entry On Saturday June 13th 2009 I attended a talk by Eli White on Scalable web applications.

» Scalable Web Applications Programming the new world: Programmi

Eli White previously worked at digg.com and now holds the position PHP Community Manager & DevZone Editor-in-Chief at Zend Technologies. Node and Scaling in the Small vs Scaling in the Large. Over the past few weeks, I’ve been taking whatever spare moments I can find to think about what technologies we’re going to use to build the initial release of BankSimple.

Node and Scaling in the Small vs Scaling in the Large

Many people would probably assume that I’d immediately reach for Scala, what with having co-authored a book on the language, but that’s not how I approach engineering problems. Each and every problem has an appropriate set of applicable technologies, and it’s up to the engineer to justify their use. (Incidentally, Scala may well be a good fit for BankSimple, in no small part due to a bunch of third-party Java code that we need to integrate with, but that’s a whole different blog post, probably for a whole different blog.) One of the most talked-about technologies amongst the Hacker News crowd is Node, a framework for writing and running event-driven JavaScript code on the V8 virtual machine. How to Succeed at Capacity Planning Without Really Trying : An I.

Examples

Digg: 4000% Performance In. An Unorthodox Approach to Database Design : The Coming of the Sh. Update 4: Why you don’t want to shard. by Morgon on the MySQL Performance Blog.

An Unorthodox Approach to Database Design : The Coming of the Sh

Optimize everything else first, and then if performance still isn’t good enough, it’s time to take a very bitter medicine. Update 3: Building Scalable Databases: Pros and Cons of Various Database Sharding Schemes by Dare Obasanjo. Excellent discussion of why and when you would choose a sharding architecture, how to shard, and problems with sharding.Update 2: Mr. 6 Ways to Kill Your Servers - Learning How to Scale the Hard Way. This is a guest post by Steffen Konerow, author of the High Performance Blog.

6 Ways to Kill Your Servers - Learning How to Scale the Hard Way

No to SQL? Anti-database movement gains stea. Eric Lai published a provoking article on Computerworld magazine titled “No to SQL?

No to SQL? Anti-database movement gains stea

Anti-database movement gains steam” where he pointed to many references in which different Internet-based companies chose an alternative approach to the traditional SQL database. The write-up was driven from the the inaugural get-together of the burgeoning NoSQL community who seem to represent a growing Anti-SQL database movement. Quoting Jon Travis from this article: Relational databases give you too much. They force you to twist your object data to fit a RDBMS [relational database management system], The article points to specific examples that led different companies such as Google, Amazon, Facebook to choose an alternative approach.

Demand for extremely large scale: “BigTable, is used by local search engine Zvents Inc. to write 1 billion cells of data per day.” Complexity and cost of setting up database clusters: High Performance Scalable Data Stores. Advice from Google on large distributed syste. Google Fellow Jeff Dean gave a keynote talk at LADIS 2009 on "Designs, Lessons and Advice from Building Large Distributed Systems".

Advice from Google on large distributed syste

Slides (PDF) are available. Some of this talk is similar to Jeff's past talks but with updated numbers. Let me highlight a few things that stood out: A standard Google server appears to have about 16G RAM and 2T of disk. If we assume Google has 500k servers (which seems like a low-end estimate given they used 25.5k machine years of computation in Sept 2009 just on MapReduce jobs), that means they can hold roughly 8 petabytes of data in memory and, after x3 replication, roughly 333 petabytes on disk. Jeff says, "Things will crash. Jeff emphasizes the importance of back of the envelope calculations on performance, "the ability to estimate the performance of a system design without actually having to build it.

" Data center guru James Hamilton was at the LADIS 2009 talk and posted detailed notes. Scaling Online Social Networks without Pains. Real World Web: Performance & Scalability. Put that database in memory. An upcoming paper, "The Case for RAMClouds: Scalable High-Performance Storage Entirely in DRAM" (PDF), makes some interesting new arguments for shifting most databases to serving entirely out of memory rather than off disk.

Put that database in memory

The paper looks at Facebook as an example and points out that, due to aggressive use of memcached and caches in mysql, the memory they use already is about "75% of the total size of the data (excluding images). " They go on to argue that a system designed around in-memory storage with disk just used for archival purposes would be much simpler, more efficient, and faster. They also look at examples of smaller databases and note that, with servers getting to 64G of RAM and higher and most databases just a couple terabytes, it doesn't take that many servers to get everything in memory. 10 eBay Secrets for Planet. Cassandra @ Twitter: An Interview with Ryan King « MyNoSQL.

There have been confirmed rumors about Twitter planning to use Cassandra for a long time.

Cassandra @ Twitter: An Interview with Ryan King « MyNoSQL

But except the mentioned post, I couldn’t find any other references. Twitter is fun by itself and we all know that NoSQL projects love Twitter. So, imagine how excited I was when after posting about Cassandra 0.5.0 release, I received a short email from Ryan King, the lead of Cassandra efforts at Twitter simply saying that he would be glad to talk about these efforts. Howfuckedismydatabase.com. Troubles with Sharding - What can we learn from the Foursquare Incident? For everything given something seems to be taken. Caching is a great scalability solution, but caching also comes with problems. Sharding is a great scalability solution, but as Foursquare recently revealed in a post-mortem about their 17 hours of downtime, sharding also has problems. MongoDB, the database Foursquare uses, also contributed their post-mortem of what went wrong too.