Posted By Rich Miller On June 24, 2008 @ 6:44 am In Companies | Comments Disabled
“Down time used to be our most profitable product,” jokes Domas Mituzas, a performance engineer at Wikipedia. The gag is that when Wikipedia is offline, the site often displays a page seeking donations for additional servers.
As a non-profit running one of the world’s busiest web destinations, Wikipedia  provides an unusual case study of a high-performance site. In an era when Google and Microsoft can spend a half-billion dollars on one of their global data center projects, Wikipedia runs on fewer than 300 servers from a single data center in Tampa, Fla. It also has servers in Amsterdam at the AMS-IX peering exchange.
“The traditional approach to availability isn’t exactly our way,” said Mituzas, who spoke about Wikipedia’s infrastructure Monday at the O’Reilly Velocity conference. “I’m not suggesting you should follow how we do it. But losing a few seconds of changes doesn’t destroy our business. As long as a crash doesn’t turn into a disaster, there’s no witch hunting or heads rolling.”
The engineers on the Wikipedia team may not take themselves too seriously, but they are serious about performance. That’s in keeping with Wikipedia’s guiding principles, which emphasize community over commerce (the site runs no ads) and getting excellent mileage out of its donations. Wikipedia maintains high 99-percent availability, and the usage data for Wikipedia includes some mind-boggling numbers.
Mituzas, whose works as a MySQL support engineer for Sun Microsystems in his “day job,” shared the following metrics on Wikipedia’s operations:
The site started as a Perl CGI script running on single server in 2001. Wikipedia now has 200 application servers, 20 database servers and 70 servers dedicated to Squid  cache servers.
Wikipedia is powered by the MediaWiki  software, which was originally written to run Wikipedia and is now an open source project. MediaWiki uses PHP running on a MySQL database. Mituzas said MySQL instances range from 200 to 300 gigabytes. In addition to Squid, Wikipedia uses Memcached  and the Linux Virtual Server  load balancer. Wikipedia also uses database sharding  to set up master-slave relationships between databases.
Mituzas summed up his view of Wikipedia’s operations in a blog post about his Velocity presentation: “As I see it, in such context Wikipedia is more interesting as a case of operations underdog – non-profit lean budgets, brave approaches in infrastructure, conservative feature development, and lots of cheating and cheap tricks (caching! caching! caching!).”
Article printed from Data Center Knowledge: http://www.datacenterknowledge.com
URL to article: http://www.datacenterknowledge.com/archives/2008/06/24/a-look-inside-wikipedias-infrastructure/
URLs in this post:
 Wikipedia: http://wikipedia.org/
 Squid: http://www.squid-cache.org/
 MediaWiki: http://www.mediawiki.org/wiki/MediaWiki
 Memcached: http://www.danga.com/memcached/
 Linux Virtual Server: http://www.linuxvirtualserver.org/
 database sharding: http://www.datacenterknowledge.com/archives/2007/Apr/27/database_sharding_helps_high-traffic_sites.html
 Mituzas: http://dammit.lt/uc/workbook2007.pdf
 Mark Bergsma: http://www.nedworks.org/~mark/presentations/san/Wikimedia%20architecture.pdf
 blog post : http://dammit.lt/2008/06/19/wikipedia-at-velocity-conference/
 Rich Miller: http://www.datacenterknowledge.com/archives/author/richm/
Copyright © 2012 Data Center Knowledge. All rights reserved.