An Unorthodox Approach to Database Design : The Coming of the Sh. Update 4: Why you don’t want to shard. by Morgon on the MySQL Performance Blog.
Optimize everything else first, and then if performance still isn’t good enough, it’s time to take a very bitter medicine. Update 3: Building Scalable Databases: Pros and Cons of Various Database Sharding Schemes by Dare Obasanjo. Excellent discussion of why and when you would choose a sharding architecture, how to shard, and problems with sharding.Update 2: Mr. Moore gets to punt on sharding by Alan Rimm-Kaufman of 37signals. Insightful article on design tradeoffs and the evils of premature optimization. Once upon a time we scaled databases by buying ever bigger, faster, and more expensive machines. What is sharding and how has it come to be the answer to large website scaling problems?
Information Sources What is sharding? While working at Auction Watch, Dathan got the idea to solve their scaling problems by creating a database server for a group of users and running those servers on cheap Linux boxes. Nati Shalom's Blog: Scaling Out MySQL. With the recent acquisition of MySQL by Sun, there has been talk about the MySQL open source database now becoming relevant to large enterprises, presumably because it now benefits from Sun's global support, professional services and engineering organizations.
In a blog post about the acquisition, SUN CEO Jonathan Schwartz wrote that this is one of his objectives. While the organizational aspects may have been addressed by the acquisition, MySQL faces some technology limitations which hinder its ability to compete in the enterprise. Like other relational databases, MySQL becomes a scalability bottleneck because it introduces contention among the distributed application components. There are basically two approaches to this challenge that I'll touch in this post: 1. 2. Database War Stories #3: Flickr - O'Reilly Radar.
Continuing my series of queries about how “Web 2.0″ companies used databases, I asked Cal Henderson of Flickr to tell me “how the folksonomy model intersects with the traditional database.
How do you manage a tag cloud?” He replied: “lots of the ‘web 2.0′ feature set doesn’t fit well with traditional normalised db schema design. denormalization (or heavy caching) is the only way to generate a tag cloud in milliseconds for hundereds of millions of tags. you can cache stuff that’s slow to generate, but if it’s so expensive to generate that you can’t ever regenerate that view without pegging a whole database server then it’s not going to work.” Here’s the full text of my exchange with Cal: The first question I asked was “what’s your database architecture?” Here’s what Cal had to say about that: Next, I asked about lessons learned in managing the data store, and any particular war stories that would make great illustrations of those lessons learned. Maybe Normalizing Isn't Normal.
How I Learned to Stop Worrying and Love Using a Lot of Disk Spac. Update 3: ReadWriteWeb says Google App Engine Announces New Pricing Plans, APIs, Open Access.
Pricing is specified but I'm not sure what to make of it yet. An image manipulation library is added (thus the need to pay for more CPU :-) and memcached support has been added. Memcached will help resolve the can't write for every read problem that pops up when keeping counters.Update 2: onGWT.com threw a GAE load party and a lot of people came. The results at Load test : Google App Engine = 1, Community = 0. GAE handled a peak of 35 requests/second and a sustained 10 requests/second. How do you structure your database using a distributed hash table like BigTable? Flickr anticipated this design in their architecture when they chose to duplicate comments in both the commentor and the commentee user shards rather than create a separate comment relation.
But Flickr’s reasoning was genius. From one world view comments logically belong to a relation binding comments and users together. Gabe Wachob: Google App Engine: Its the Architecture Stupid! I think most people are missing the point about Google App Engine.
Its the Commoditization of Software Architecture and Scaling Skills Google App Engine is Google's attempt to democratize the scaling of web applications. Put it another way, they're trying to commoditize the hard-to-find skills and experience needed for building massively scalable web apps. Ask any startup and they can find any number of web developers who've built a web 2.0 app. But how many developers or architects have experience and the mindset to build applications that support 100s of thousands of concurrent users with 5 9's uptime? Scaling Big is Really Hard The web industry has mostly moved to a horizontal scaling model - but this is not a model that most web application developers have experience with.
Without getting into too much detail, this stuff is hard. Google AppEngine Makes It Easy by Imposing a New Architecture Here's why Google App Engine is important, at least in intent. Why the BigTable is so Important.