The Wikimedia Foundation will add a new data center in the coming year to support Wikipedia and its other sites, saying that “ensuring high site availability” for Wikipedia is now the foundation’s number one priority.
“Our projects are vulnerable to primary data center failure,” the group says in its annual report. “We will build out a second data center to enable safe failover in the case of disaster. We will also increase uptime by improving site monitoring, capacity planning and operations response.”
Wikimedia maintains the online encyclopedia Wikipedia, which is among the busiest sites in the world, along with web properties from Google, Microsoft, Facebook and Yahoo. Wikimedia has a primary data center in Tampa, as well as a facility in Amsterdam that supports European traffic. The group is reported to have about 300 servers in Tampa and about 50 more in Amsterdam.
Outage in March
Weaknesses in Wikimedia’s infrastructure were exposed during an outage in March for the main U.S. Wikipedia site. A cooling problem in Wikimedia’s Amsterdam data center led to a heat condition that caused a server shutdown. The initial problem affected European Wikipedia users, but an attempt to “fail over” to the Tampa data center went awry, and the main Wikipedia site was knocked offline.
That led to plans to expand Wikimedia’s infrastructure and strengthen its operations. The foundation has budgeted $3.27 million to cover the expense of the new facility, on top of the $1.87 million it expects to spend on maintaining the Tampa and Amsterdam data centers. That level of spending is modest by data center standards, but represents a major investment for Wikimedia, which is a non-profit. In February the Wikimedia Foundation received a $2 million grant from Google to help expand its data centers.
Where will the data center be located? “We’re planning a second US data centre, (most) likely in Virginia,” Wikimedia Foundation Executive Director Sue Gardner wrote in a recent email to a list. Like Amsterdam, northern Virginia is a key intersection for communications networks, which connect in the region’s many data centers.
Big Traffic, Tiny Staff
The Wikimedia Foundation noted that it operates one of the world’s most popular web sites with a tiny operations team, “which means that in emergencies, our response has sometimes been sub-optimal, simply because there weren’t enough people available to respond well,” the foundation writes. “The smallness of the team has also limited our ability to deploy software updates quickly, to properly monitor and systematically improve site performance, and to ensure that all our data is secure and that our operations infrastructure is resilient and future-proof.
“In 2010-11, we are focused on eliminating single points of failure (both in terms of staff and infrastructure), improving operations response, monitoring and optimizing site performance around the world, and supporting the software engineering team in its deployments,” the foundation said. “The single largest infrastructure project will be the build-out of a new data centre location, with the primary objective to ensure that we have full failover capacity in the event of a major disaster.”
That’s a change in attitude from when we first wrote about Wikipedia’s infrastructure back in 2008. “Down time used to be our most profitable product,” joked Domas Mituzas, a performance engineer at Wikipedia. The gag was that when Wikipedia is offline, the site often displays a page seeking donations for additional servers.