Cooling system pipes inside a Google data center Alphabet/Google
Cooling system pipes inside a Google data center

AI in Data Center Management: What It Means for Staffing and Processes

Critical Thinking: Machine learning promises to usher in a new era of advanced data center management, but many facilities still have a long way to go from arcane, spreadsheet-based management habits to the automation Promised Land.

Critical Thinking is a weekly column on innovation in data center infrastructure design and management. More about the column and the author here.

The end game for data center infrastructure management (DCIM) software is that it eventually enables self-managing, or fully autonomic, data centers.

The hope is that AI-driven management software (likely cloud-based) will monitor and control IT and facilities infrastructure, as well as applications, seamlessly and holistically – potentially across multiple sites. Cooling, power, compute, workloads, storage, and networking will flex dynamically to achieve maximum efficiency, productivity, and availability.

Facilities equipment and IT will also be self-healing to some degree by applying cloud-based analytics to sensor data harvested from thousands of sites to guide and enact targeted predictive and preventive maintenance programs. Spare parts will be ordered, tested, and installed (perhaps by dexterous robots) to exactly align with when they are required to avoid failures but also to avoid unnecessary maintenance and testing.  

This kind of AI-driven management may be a decade or more away, but the industry has made inroads toward achieving some of its aspects. For example, Google revealed back in 2014 that it had been using technology gained through its purchase of UK-based AI specialist DeepMind to improve data center facilities equipment management in some of its sites.

As Google pointed out at the time, with so many pieces of power and cooling equipment interacting, data center facilities management is arguably too complex to be left to humans. The company said at the time:

Consider one simplified scenario: just 10 pieces of equipment, each with 10 settings, would have 10 to the 10th power, or 10 billion, possible configurations, a set of possibilities far beyond the ability of anyone to test for real — but far fewer than an actual data center’s possible configurations.

AI-Driven Efficiency

Google used historical data collected by thousands of sensors within its data centers to train an ‘”ensemble of deep neural networks.” Applying the resulting algorithms to its infrastructure, Google says, it achieved a 40 percent reduction in energy used for cooling and a 15 percent reduction in overall energy overhead. It is continuing to develop and refine its use of machine learning – a subset of AI and the current state of the art in the space – and will no doubt achieve even better results.

But it’s not just advanced cloud operators such as Google that are experimenting with ML. DCIM software supplier Vigilent says it has been integrating ML into its Dynamic Cooling Management System for several years:

Every minute, data from hundreds or thousands of environmental sensors is collected across the wireless mesh network, where it makes its way to the central artificial intelligence (AI) Engine… sophisticated dynamic control algorithms then send commands to the facility’s cooling system in real-time, making decisions designed to optimize performance.

We can expect more DCIM suppliers and colocation and cloud providers with homegrown tools to integrate ML and other forms of AI into management systems in the near future. The move from isolated on-premises DCIM software to cloud-based data center management-as-a-service (DMaaS) tools – where data from multiple sites is aggregated in the cloud – should also help accelerate this process.

Long Way from Spreadsheets to AI

But while it’s easy to get caught up in the exciting and disruptive potential of AI, it’s also important to reflect on the reality of how most data centers continue to be designed, built, and operated.

The fact is that a lot of the processes – especially on the facilities side – are still firmly rooted in the mundane and manual. Case in point: as we have previously highlighted, DCIM tools have been around for close to a decade, but large numbers of data center operators remain skeptical about the technology. As many as 50 percent of sites – probably those at the smaller end of the spectrum – still rely on trusted, but less intelligent, building management tools, as well as spreadsheets, written documents, and other manual processes to run their facilities.

Being Digital

So called operations and maintenance, or O&M, practices are still routinely detailed in paper documents – or inside the heads of facilities staff. This is despite the development of software tools – including some DCIM software as well as specialist computerized maintenance management systems (CMMS) – to help manage and automate the application of these vital management procedures.

Before they can start to take advantage of the potential benefits of advanced AI-enabled management tools that may soon emerge data center operators will need to have tackled the low hanging fruit of smarter operations. These include:

  • Deploying on-premises or cloud-based DCIM tools for asset management and environmental monitoring. This monitoring and management layer will need to be in place before some of the more sophisticated actions enabled by AI can be implemented.
  • Installing more sensors and meters – including acoustic and vibration devices – to closely monitor temperature, humidity, power quality, and other metrics. ML tools will require more and more data.
  • Better aligning IT and facilities teams (supported by DCIM software) so the facility is managed more holistically.
  • Digitizing and automating as many previously manual processes and procedures as feasibly possible.

What Happens to the Data Center Staff?

Another big elephant in the room in any conversation about introducing AI-enabled data center management is what will it mean for facilities and IT staff?  As we have previously highlighted, there is a move toward “lights-out” data centers, where the management of IT and some facilities infrastructure is automated and carried out remotely. As AI tools become more developed, it’s likely that process will intensify and proliferate to more kinds of sites.

There will inevitably continue to be a reduction in the number of staff required on-site at any one facility. But rather than jobs being lost overall, more operations staff are likely to work for services companies – such as facilities management services – supporting multiple operators and sites.

Rise of the Machines?

For every positive story about the potential benefits of AI, there are also warnings –often through books and films – of machines running amok and even threatening human life. That’s probably a little far fetched for the world of data centers, but as Google nearly found to its cost, the answers and actions delivered by AI systems may not always be what was originally anticipated.

Just as Skynet in the film The Terminator took a dispassionate, logical view of preventing conflict, finding that mankind was the problem, Google’s algorithm reached a very simple and accurate conclusion about improving the efficiency of its sites:

The model’s first recommendation for achieving maximum energy conservation was to shut down the entire facility, which, strictly speaking, wasn’t inaccurate, but wasn’t particularly helpful either.

Given the potential for unintended consequences, the fact that preparing for AI-driven management may be a slow and deliberate process that requires a lot of groundwork is probably not a bad thing at all.

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish