Google made headlines when it revealed that it is using machine learning to optimize its data center performance. But the search giant isn't the first company to harness artificial intelligence to fine-tune its server infrastructure. In fact, Google's effort is only the latest in a series of initiatives to create an electronic "data center brain" that can analyze IT infrastructure.
Automation has always been a priority for data center managers, and has become more important as facilities have become more complex. The DevOps movement seeks to "automate all the things" in a data center, while the push for greater efficiency has driven the development of smarter cooling systems.
Where is this all headed? Don't worry. The data center won't be a portal to Skynet anytime soon. Data center managers love technology, but they don't totally trust it.
“You still need humans to make good judgments about these things,” said Joe Kava, vice president for data centers at Google. “I still want our engineers to review the recommendations.”
Kava said last week that Google has begun using a neural network to analyze the oceans of data it collects about its server farms and to recommend ways to improve them. Kava said the use of machine learning will allow Google to reach new frontiers in efficiency in its data centers, moving beyond what its engineers can see and analyze.
While there have been modest efforts to create unmanned "lights out" data centers, these are typically facilities being managed through remote monitoring, with humans rather than machines making the decisions. Meanwhile, Google and other companies developing machine learning tools for the data center say the endgame is using artificial intelligence to help design better data centers, not to replace the humans running them.
Romonet: predictive TCO modeling
One company that has welcomed the attention around Google's announcement is Romonet, the UK-based maker of data center management tools. In 2010 the company introduced Prognose, a software program that uses machine learning to build predictive models for data center operations.
Romonet focuses on modeling the total cost of ownership (TCO) of operating the entire data center, rather than a single metric such as PUE (Power Usage Effectiveness), which is where Google is targeting its efforts. The company says its predictive model is calibrated to 97 percent accuracy across a year of operations.
Google's approach is "a clever way (albeit a source-data-intensive one) of basically doing what we are doing," Romonet CEO and co-founder Zahl Limbuwala wrote in a blog post. "Joe’s presentation could have been one of ours. They've put their method into the public domain but not their actual software - so if you want what they've got you need to build it yourself. Thus they just shone a light on us that we couldn't have done ourselves."
Romonet's modeling software allows businesses to accurately predict and manage financial risk within their data center or cloud computing environment. Its tools can work from design and engineering documents for a data center to build a simulation of how the facility will operate. Working from engineering documents allows Romonet to provide a detailed operational analysis without the need for thermal sensors, airflow monitoring or any agents – which also allows it to analyze a working facility without impacting its operations.
These types of models can be used to run design simulations, allowing companies to conduct virtual test-drives of new designs and understand how they will impact the facility.
“I can envision using this during the data center design cycle,” said Google's Kava. “You can use it as a forward-looking tool to test design changes and innovations.”
Vigilent: auto-tuning for cooling
While Google is focusing on PUE and Romonet takes a TCO-driven holistic view of the data center, others have focused on using artificial intelligence to automate cooling. Chief among these is Vigilent, which uses machine learning to provide real-time optimization of cooling within server rooms. Vigilent's AI software collects temperature data from wireless sensors distributed throughout the data hall and dynamically manages the environment to address hot spots from shifting workloads.
"Our systems are built around a core artificial intelligence engine that resides in the server," the company says. "It contains sophisticated algorithms and AI technology that learns over time. This begins from the time the system is commissioned and an initial behavior profile is developed during a multi-hour period of perturbation, where responses are provoked and responses measured. It continues throughout the regular use of the system, learning as it simultaneously controls the devices in its network."
Vigilent recently got a major vote of confidence when Schneider Electric announced that it would integrate the Oakland, California-based startup's technology into its data center infrastructure management (DCIM) suite.
Finding a comfort level
Artifical intelligence is the newest wrinkle in the ongoing effort to automate the data center. It's taken data center managers a while to get used to the idea of handing over key management tasks to automated tools. The best illustration of this is in cooling, which has traditionally represented the greatest opportunity for savings through improved efficiency. Tools like computational fluid dynamics (CFD) software enabled data center managers to build 3D models of temperatures and airflows, identifying "hot spots" in the cold aisles.
But it has not been an easy sell. HP was on the front lines of this effort in 2006, when it rolled out Dynamic Smart Cooling, the first offering seeking to automate data center cooling. Dynamic Smart Cooling collected data from a sensor network and used it to automate management of computer room air conditioners (CRACs) and air handlers (CRAHs). By 2009, HP was retooling its offering, acknowledging that it was "seeing some resistance" to automating changes to cooling infrastructure.
Switch took a slightly different approach to using automation in cooling. When it built its massive SuperNAP campus, the Las Vegas colocation company developed custom cooling units that sit outside the data center and can automatically switch between six different modes of cooling, depending upon the external weather conditions.
Over the past five years, a number of data center researchers and vendors have focused on automated cooling systems that can adjust to temperature and pressure changes in the server environment, including Opengate Data Systems, Intel, Brocade and SynapSense
Most recently, Facebook has developed automation to address the tendency for on-board server fans to fight with row-level cooling systems as the temperature rises. A Facebook patent filing describes the use of a load balancer that can redistribute the workload across servers to shift compute activity away from “hot spots” inside racks.
The bottom line: As we see greater acceptance of automation in the broader data center sector, hyperscale data centers will lead the way in the use of machine learning to enhance designs for peak efficiency. As these tools develop a track record, they will be of greater interest to the enterprise.