A Look Inside Wikipedia's Infrastructure
“Down time used to be our most profitable product,” jokes Domas Mituzas, a performance engineer at Wikipedia. The gag is that when Wikipedia is offline, the site often displays a page seeking donations for additional servers.
As a non-profit running one of the world’s busiest web destinations, Wikipedia provides an unusual case study of a high-performance site. In an era when Google and Microsoft can spend a half-billion dollars on one of their global data center projects, Wikipedia runs on fewer than 300 servers from a single data center in Tampa, Fla. It also has servers in Amsterdam at the AMS-IX peering exchange.
“The traditional approach to availability isn’t exactly our way,” said Mituzas, who spoke about Wikipedia’s infrastructure Monday at the O’Reilly Velocity conference. “I’m not suggesting you should follow how we do it. But losing a few seconds of changes doesn’t destroy our business. As long as a crash doesn’t turn into a disaster, there’s no witch hunting or heads rolling.”
The engineers on the Wikipedia team may not take themselves too seriously, but they are serious about performance. That’s in keeping with Wikipedia’s guiding principles, which emphasize community over commerce (the site runs no ads) and getting excellent mileage out of its donations. Wikipedia maintains high 99-percent availability, and the usage data for Wikipedia includes some mind-boggling numbers.
Mituzas, whose works as a MySQL support engineer for Sun Microsystems in his “day job,” shared the following metrics on Wikipedia’s operations:
- 50,000 http requests per second
- 80,000 SQL queries per second
- 7 million registered users
- 18 million page objects in the English version
- 250 million page links
- 220 million revisions
- 1.5 terabytes of compressed data
The site started as a Perl CGI script running on single server in 2001. Wikipedia now has 200 application servers, 20 database servers and 70 servers dedicated to Squid cache servers.
Wikipedia is powered by the MediaWiki software, which was originally written to run Wikipedia and is now an open source project. MediaWiki uses PHP running on a MySQL database. Mituzas said MySQL instances range from 200 to 300 gigabytes. In addition to Squid, Wikipedia uses Memcached and the Linux Virtual Server load balancer. Wikipedia also uses database sharding to set up master-slave relationships between databases.
Mituzas summed up his view of Wikipedia’s operations in a blog post about his Velocity presentation: “As I see it, in such context Wikipedia is more interesting as a case of operations underdog – non-profit lean budgets, brave approaches in infrastructure, conservative feature development, and lots of cheating and cheap tricks (caching! caching! caching!).”
Interesting information and approach. I’d be interested to know more about the version(s) of MySQL they use , adn if any other database platforms
Well, the MySQL (and PHP) version is shown at the top of each wiki’s [[Special:Version]]