This month, we focus on the open source data center. From innovation at every physical layer of the data center coming out of Facebook’s Open Compute Project to the revolution in the way developers treat IT infrastructure that’s being driven by application containers, open source is changing the data center throughout the entire stack. This March, we’ll zero in on some of those changes to get a better understanding of the pervasive open source data center.
When Facebook was launching the first data center it designed and built on its own, the first of now several Facebook data centers in rural Oregon, Jason Taylor, the company’s VP of infrastructure, expected at least a little fallout from the new power distribution design that was deployed there.
Instead of a centralized UPS plant in a separate room behind the doors of the main data hall, Facebook had battery cabinets sitting side by side with IT racks, ready to push 48V DC power to the servers at moment’s notice. What made him nervous was that the servers needed 12V AC power, and mechanism that switched between two different combinations of voltage and current had to work like a Swiss watch if you didn’t want to fry some gear.
“I would have expected at least some fallout,” Taylor said in an interview. But, the system was tested many times in the first couple of years in Prineville – both intentionally and unintentionally – and he learned to stop worrying and [insert the rest of the cliché].
Facebook later open sourced some of the innovations in data center design that were introduced in Prineville through the Open Compute Project, an initiative started by the social networking giant to bring some open source software ethos to IT hardware, power, and cooling infrastructure.
Better Efficiency Through Disaggregation
One of the interesting aspects about Facebook data center designs is that the company has been able to scale tremendously without increasing power density. Many data center industry experts predicted several years ago that the overall amount of power per rack is going to grow in data centers – a forecast that for the most part has not materialized.
“Rather than targeting 20kW per rack or 15kW per rack, we actually targeted about 5.5kW per rack,” Taylor said. “We understood that the low power density on racks was just fine.”
One big reason Facebook has been able to keep its data centers low-density is that its infrastructure and software teams have been willing to completely rethink their methods on a regular basis. This, coupled with advances in processors and networking technology, has resulted in new levels of efficiency that enabled Facebook to do more with less.
One of the most powerful concepts that resulted from this kind of rethinking is disaggregation, or looking at an individual component of a switch or a server as the basic infrastructure building block – be it CPU, memory, disk, or a NIC – not the entire box.
Disaggregation in Action
An example that demonstrates just how powerful disaggregation can be is the way the backend infrastructure that populates a Facebook user’s news feed is set up. Until sometime last year, Multifeed, the name of the news feed backend, consisted of uniform servers, each with the same amount of memory and CPU capacity.
The query engine that pulls data for the news feed, called Aggregator, uses a lot of CPU power. The storage layer it pulls data from keeps it in memory, so it can be delivered faster. This layer is called Leaf, and it taxes memory quite heavily.
The previous version of a Multifeed rack contained 20 servers, each running both Aggregator and Leaf. To keep up with user growth, Facebook engineers continued adding servers and eventually realized that while CPUs on those servers were being heavily utilized, a lot of the memory capacity was sitting idle.
To fix the inefficiency, they redesigned the way Multifeed works – the way the backend infrastructure was set up and the way the software used it. They designed separate servers for Aggregator and Leaf functions, the former with lots of compute, and the latter with lots of memory.
This resulted in a 40 percent efficiency improvement in the way Multifeed used CPU and memory resources. The infrastructure went from a CPU-to-RAM ratio of 20:20 to 20:5 or 20:4 – a 70 to 80-percent reduction in the amount of memory that needs to be deployed.
Network – the Great Enabler
According to Taylor, this Leaf-Aggregator model, which is now also used for search and many other services, couldn’t have been possible without the huge increases in network bandwidth Facebook has been able to enjoy.
“A lot of the most interesting stuff that’s happening in software at large scale is really being driven by the network,” he said. “We’re able to make these large long-term software bets on the network.”
Today, servers and switches in Facebook data centers are interconnected with 40-Gig links – up from 1 Gig links from the top-of-rack switch to the server just six years ago. New Facebook data centers being built today will use 100-Gig connectivity, thanks to the latest Wedge 100 switch the company designed and announced earlier this year.
“As of January of next year, everything will be 100 Gig,” Taylor said.
With that amount of bandwidth, having memory next to CPU is becoming less and less important. You can split the components and optimize for each individual one, without compromises.
“Locality is starting to become a thing of the past,” Taylor said. “The trend in networking over the last six years is too big to ignore.”
Disaggregation Keeps Density Down
Disaggregation has also helped keep overall power density in Facebook data centers at bay.
Some compute-heavy racks, such as the ones populated with web servers, can be between 10kW and 12kW per rack. Others, such as the ones packed with storage servers, can be about 4.5kW per rack.
As long as the overall facility averages out to about 5.5kW per rack, it works, Taylor said.
One of the disaggregation extremes Facebook has gone to recently is designing storage servers specifically for rarely accessed user content, such as old photos, and designing separate facilities next to its primary data centers optimized just for those servers.
The “cold storage racks are unbelievably cold,” Taylor said, referring to the amount of power they consume. They are at 1 to 1.5kW per rack, he said.
As a result, it now takes 75 percent less energy to store and serve photos people dig out of their archives to post on a Thursday with a #tbt tag than it did when those photos were stored in the primary data centers.
As it looks for greater and greater efficiency, Facebook continues to re-examine and refine the way it designs software and the infrastructure that software runs on.
The concept of disaggregation has played a huge role in helping the company scale its infrastructure, increase its capacity without increasing the amount of power it requires, but disaggregation at that scale could not have been possible without rapid progress in data center networking technology over the recent years.