Every time Facebook’s data center engineers figure out a way to reduce server consumption by a single watt, the improvement, at Facebook’s scale, has the potential to add millions of dollars to the company’s bottom line. This is why its infrastructure team never stops looking for new ways to win back some power. The company’s product consists of bits and bytes, and its data centers are its manufacturing plants. The more efficiently Facebook servers can convert power into text, pictures and video, the wider its profit margin becomes.
The company has optimized hardware and supporting data center infrastructure across the board, and the next step is using software management tools to utilize the infrastructure more efficiently. The latest such tool it has come up with is Autoscale. It was designed to work in tandem with load balancers that distribute workload among servers in each cluster to make sure the load balancers take energy efficiency into consideration when doing their job.
Qiang Wu, infrastructure software engineer at Facebook, described Autoscale in a post on the Facebook Engineering blog Friday. The system is already running on production clusters in Facebook data centers and has been effective in substantially reducing energy consumption.
‘Round-robin’ load balancing is inefficient
Workload on Facebook servers fluctuates throughout the day, peaking around noon and dropping substantially around midnight. Until now, a load balancer would distribute the load evenly among servers in a cluster it manages, regardless of the size of the workload. Thus, CPU utilization would drop to very low levels across the board during low-demand periods and increase across the board as demand grew.
The team learned, however, that this “round-robin” approach to load ballancing is not really the most efficient way to go. “As a result, during low-workload hours, especially around midnight, overall CPU utilization is not as efficient as we’d like,” Wu wrote.
One type of a web server Facebook uses, for example, draws 60 watts when idle, 130 watts at low-level CPU utilization and 150 watts at medium-level utilization. The difference in power consumption between low-level and medium-level utilization is small, while the difference in the number of requests processed at the two levels is quite large. Therefore, to increase overall output-per-watt ratio, it is better to avoid running servers at low-level utilization, sticking to either idle or medium level.
Fighting the evil of low CPU utilization
This is where Autoscale comes in. An Autoscale controller collects request volume and CPU utilization data from a cluster and dictates to the load balancer responsible for that cluster how many servers should process the traffic at hand. By throttling the amount of active servers in a cluster, the software increases utilization of each active server, leaving the others idle. Autoscale adjusts the size of the active server pool dynamically.
To decide how big the optimal active server pool should be, Facebook engineers model the relationship between CPU utilization and requests per second, among other factors. While energy efficiency is important, it is also important not to load a subset of hardware to a point where performance starts suffering. “We conduct experiments to understand how they correlate and then estimate the model based on experimental data,” Wu wrote.
Autoscale works as intended
Results of implementing Autoscale in production are impressive. The team described normalized energy consumption data from one of its production web clusters managed by Autoscale over a 24-hour period, showing a 27-percent power-draw reduction around midnight:
The savings decrease gradually as the workload grows, eventually reaching ‘zero’ at peak demand. Average power savings over 24 hours from a number of different web clusters is between 10 percent and 15 percent. “In a system with a large number of web clusters, Autoscale can save a significant amount of energy,” the engineers wrote.