Google is using machine learning and artificial intelligence to wring even more efficiency out of its mighty data centers.
In a presentation today at Data Centers Europe 2014, Google's Joe Kava said the company has begun using a neural network to analyze the oceans of data it collects about its server farms and to recommend ways to improve them. Kava is the Internet giant's vice president of data centers.
In effect, Google has built a computer that knows more about its data centers than even the company's engineers. The humans remain in charge, but Kava said the use of neural networks will allow Google to reach new frontiers in efficiency in its server farms, moving beyond what its engineers can see and analyze.
Google already operates some of the most efficient data centers on earth. Using artificial intelligence will allow Google to peer into the future and model how its data centers will perform in thousands of scenarios.
In early usage, the neural network has been able to predict Google's Power Usage Effectiveness with 99.6 percent accuracy. Its recommendations have led to efficiency gains that appear small, but can lead to major cost savings when applied across a data center housing tens of thousands of servers.
Why turn to machine learning and neural networks? The primary reason is the growing complexity of data centers, a challenge for Google, which uses sensors to collect hundreds of millions of data points about its infrastructure and its energy use.
"In a dynamic environment like a data center, it can be difficult for humans to see how all of the variables interact with each other," said Kava. "We've been at this (data center optimization) for a long time. All of the obvious best practices have already been implemented, and you really have to look beyond that."
Enter Google's 'Boy Genius'
Google's neural network was created by Jim Gao, an engineer whose colleagues have given him the nickname "Boy Genius" for his prowess analyzing large datasets. Gao had been doing cooling analysis using computational fluid dynamics, which uses monitoring data to create a 3D model of airflow within a server room.
Gao thought it was possible to create a model that tracks a broader set of variables, including IT load, weather conditions, and the operations of the cooling towers, water pumps and heat exchangers that keep Google's servers cool.
"One thing computers are good at is seeing the underlying story in the data, so Jim took the information we gather in the course of our daily operations and ran it through a model to help make sense of complex interactions that his team - being mere mortals - may not otherwise have noticed," Kava said in a blog post. "After some trial and error, Jim’s models are now 99.6 percent accurate in predicting PUE. This means he can use the models to come up with new ways to squeeze more efficiency out of our operations. "
How it Works
Gao began working on the machine learning initiative as a "20 percent project," a Google tradition of allowing employees to spend a chunk of their work time exploring innovations beyond their specific work duties. Gao wasn't yet an expert in artificial intelligence. To learn the fine points of machine learning, he took a course from Stanford University Professor Andrew Ng.
Neural networks mimic how the human brain works, allowing computers to adapt and "learn" tasks without being explicitly programmed for them. Google's search engine is often cited as an example of this type of machine learning, which is also a key research focus at the company.
"The model is nothing more than series of differential calculus equations," Kava explained. "But you need to understand the math. The model begins to learn about the interactions between these variables."
Gao's first task was crunching the numbers to identify the factors that had the largest impact on energy efficiency of Google's data centers, as measured by PUE. He narrowed the list down to 19 variables and then designed the neural network, a machine learning system that can analyze large datasets to recognize patterns.
"The sheer number of possible equipment combinations and their setpoint values makes it difficult to determine where the optimal efficiency lies," Gao writes in the white paper on his initiative. "In a live DC, it is possible to meet the target setpoints through many possible combinations of hardware (mechanical and electrical equipment) and software (control strategies and setpoints). Testing each and every feature combination to maximize efficiency would be unfeasible given time constraints, frequent fluctuations in the IT load and weather conditions, as well as the need to maintain a stable DC environment."
Runs On a Single Server
As for hardware, the machine learning doesn't require unusual computing horsepower, according to Kava, who says it runs on a single server and could even work on a high-end desktop.
The system was put to work inside several Google data centers. The machine learning tool was able to suggest several changes that yield incremental improvements in PUE, including refinements in data center load migrations during upgrades of power infrastructure, and small changes in the water temperature across several components of the chiller system.
"Actual testing on Google (data centers) indicates that machine learning is an effective method of using existing sensor data to model DC energy efficiency and can yield significant cost savings," Gao writes.
The Machines Aren't Taking Over
Kava said that the tool may help Google run simulations and refine future designs. But not to worry -- Google's data centers won't become self-aware anytime soon. While the company is keen on automation, and has recently been acquiring robotics firms, the new machine learning tools won't be taking over the management of any of its data centers.
"You still need humans to make good judgments about these things," said Kava. "I still want our engineers to review the recommendations."
The neural networks' biggest benefits may be seen in the way Google builds its server farms in years to come. "I can envision using this during the data center design cycle," said Kava. "You can use it as a forward-looking tool to test design changes and innovations. I know that we're going to find more use cases."
Google is sharing its approach to machine learning in Gao's white paper, in the hopes that other hyperscale data center operators may be able to develop similar tools.
"This isn't something that only Google or only Jim Gao can do," said Kava. "I would love to see this type of analysis tool used more widely. I think the industry can benefit from it. It's a great tool for being as efficient as possible."