Database Sharding Helps High-Traffic Sites

Presentations by engineers at Google and Digg have focused attention on database sharding as a technique for managing huge databases.

Several presentations this week have focused attenton on database sharding - breaking a large database into smaller pieces to provide faster access to the data. ZDNet reports how Google used database partitioning to manage a 12 gigabyte database for its financial planning unit. Google's Chris Schulze discussed the technology during a presentation at Hyperion's Solutions 2007 conference in Orlando.

In addition to using sharding to streamline its huge in-house databases, Google last month contributed its data partitioning solution to the open source Hibernate project. A description:

Hibernate Shards offers critical data clustering and support for horizontal partitioning (also called sharding) to Hibernate. Now, customers can keep their data in more than one relational database for whatever reason-too much data or to isolate certain datasets, for instance-without added complexity when building and managing applications. Hibernate Shards is designed to encapsulate and reduce the complexity of building applications that work with sharded datasets.

Database sharding is also used by Digg, which outlined the details of its database management techniques in a presentation at the MySQL conference in Santa Clara this week.

Digg engineer Tim Ellis said sharding has helped achieve faster web site performance, but breaking a database into several smaller ones increases complexity. Ellis explains that a database can be sharded by table, date or range.

Sharding is similar to partitioning, says Ellis, but with several key differences. Sharding usually involves divvying up data onto different physical machines. Partitioning, in contrast, typically occurs on the same piece of hardware. And while MySQL does not natively allow sharding, it does support partitioned tables, federated tables and clusters.

Sharding is a familiar term to those familiar with the infrastructure of online games, where popular MMORPGs like World of Warcraft segment their user base across multiple physical machines running instances of the game.