background preloader

Hibernate

Facebook Twitter

Database Sharding at Netlog, with MySQL and PHP. This article accompanies the slides from a presentation on database sharding. Sharding is a technique used for horizontal scaling of databases we are using at Netlog. If you’re interested in high performance, scalability, MySQL, php, caching, partitioning, Sphinx, federation or Netlog, read on … This presentation was given at the second day of FOSDEM 2009 in Brussels. FOSDEM is an annual conference on open source software with about 5000 hackers. I was invited by Kris Buytaert and Lenz Grimmer to give a talk in the MySQL Dev Room. The talk was based on an earlier talk I gave at BarcampGent 2. Overview Who am I? Currently I am a Lead Web Developer at Netlog working with php, MySQL and other frontend technologies to develop and improve the features of our social network.

What is Netlog? For those of you, who are unfamiliar with Netlog, it’s best to sketch a little overview of who and what we are, and especially where we come from in terms of userbase and growth. Database Setup 1: Master (W) Sharding the Hibernate Way | High Scalability. Update: A very nice JavaWorld podcast interview with Google engineer Max Ross on Hibernate Shards. Max defines Hibernate Shards (horizontal partitioning), how it works (pretty well), virtual shards (don't ask), what they need to do in the future (query, replication, operational tools), and how it relates to Google AppEngine (not much). To scale you are supposed to partition your data. Sounds good, but how do you do it? When you actually sit down to work out all the details it’s not that easy. Information Sources What is Hibernate Shards? Shard: splitting up data sets. Schema Design for Shards When sharding you have to consider the general issues of distributed data design for high data volumes.

The Sharding Code’s Relationship to Hibernate Hibernate Shards encapsulates knowledge of all the shards. Pluggable Strategies Determine How Data Are Split Across Shards A Strategy dictates how data are spread across the shards. Some Limitations Related Articles. Piotr Woloszyn » Hibernate Shards. There are situations when you can’t put all data which you need in a single instance of a relational database. The reasons may differ. Maybe because it is too much of the data itself. Or there is a problem with network latency of a distributed architecture. Scaling? Or one of many other reasons. So… There is a new project Hibernate Shards which is a framework that is designed to encapsulate and minimize complexity of accessing multiple databases by adding support for horizontal partitioning on top of Hibernate Core. Org.hibernate.Session org.hibernate.SessionFactory org.hibernate.Criteria org.hibernate.Query have shard-aware extensions: org.hibernate.shards.session.ShardedSession org.hibernate.shards.ShardedSessionFactory org.hibernate.shards.criteria.ShardedCriteria org.hibernate.shards.query.ShardedQuery The implementations for these four interfaces serve as a sharding engine that knows how to apply an application-specific sharding logic.

Popularity: 34% [ ? Share This Permalink. An Unorthodox Approach to Database Design : The Coming of the Sh. Update 4: Why you don’t want to shard. by Morgon on the MySQL Performance Blog. Optimize everything else first, and then if performance still isn’t good enough, it’s time to take a very bitter medicine. Update 3: Building Scalable Databases: Pros and Cons of Various Database Sharding Schemes by Dare Obasanjo. Excellent discussion of why and when you would choose a sharding architecture, how to shard, and problems with sharding.Update 2: Mr. Moore gets to punt on sharding by Alan Rimm-Kaufman of 37signals. Insightful article on design tradeoffs and the evils of premature optimization.

With more memory, more CPU, and new tech like SSD, problems can be avoided before more exotic architectures like sharding are needed. Once upon a time we scaled databases by buying ever bigger, faster, and more expensive machines. What is sharding and how has it come to be the answer to large website scaling problems? Information Sources What is sharding? The advantages are: High availability. Mike Kruckenberg: Federation at Flickr: Doing Billions of Querie. « Funding: DotCom vs Today | Main | Slides for Creating INFORMATION_SCHEMA Tables » Federation at Flickr: Doing Billions of Queries a Day Listening to Dathan Pattishall talk about flickr at the 2007 MySQL User Conference. Dathan worked at AuctionWatch in 1999, then in 2003 worked at Friendster, now at Flickr.

Flickr was unable to keep up with demand. Replication was not working, too much slave lag. At AuctionWatch they put folks on separate boxes. Shards are a slice of a main database. The global ring is a lookup ring used for data that can't be federated. When you click a favorite. The users are put onto the machine that contains their shard so as they make comments, upload photos etc in real time without seeing the replication lag as data goes to another machine. During normal operation the shards are used at 50% capacity. A typical flickr page is 30 queries. As far as hardware.

MySQL 5.0 is used for auxiliary data (logs) and generating ticket ids (for auto increment).