The work of data center management is changing quickly. There are hybrid environments and multi-cloud to deal with, edge computing, and a constant onslaught of rapidly evolving cybersecurity threats.
AI promises to – any day now – come to the rescue of IT warriors, give them the silver bullet, the answer to all complexities they struggle against. Self-learning systems will adapt on their own to fast-evolving environments, protect against known and unknown threats, respond instantaneously with super-human accuracy, and do it all on the cheap.
In theory, anyway; in practice, not so much. Not yet, and probably not for a long time, due to siloed systems and a lack of integrated management platforms.
Data center complexity has been increasing exponentially, said Amr Ahmed, managing director at EY Consulting Services. In the past, a company might have had one mainframe. Then, with client-server, the environment grew to tens, hundreds, or thousands of machines, he said. "The distributed environment – hundreds of thousands; virtualization – millions; cloud – tens of millions." That's beyond human ability to manage. "AI is essential," he told DCK. "There is no way to work around it. It is not a choice. It is not optional."
The biggest cloud providers, the hyperscalers, have been applying machine learning (a type of AI) to this problem of scale for a while. "Predictions of failure, moving workload around automatically – these things are not stuff that's going to happen in the next ten years," he said. "It already exists. The cloud services providers are already using this in their cloud environments. That is how they can offer their services at scale."
Particularly in the area of data center power and cooling, advanced analytics have been used to reduce energy costs for years. "There are many tools that analyze this data and make decisions," Ahmed said.
When AI can help improve a data center's uptime, it's a clear and obvious benefit – and a big area of focus for large data center operators. AI and ML can be used to predict failure of critical tasks and avoid unexpected system and service failures or data center outages, said Dan Simion, VP of AI and analytics at Capgemini. "This approach creates a self-healing mechanism," he told DCK.
While the larger data centers providers are taking the lead here, high tech companies may also be building these kinds of AI systems from scratch, if it's in their wheelhouse, he added.
The most digitally mature companies are already seeing value from their AI investments, he said, as are companies with large data centers.
AI Hopes Crash into Silo Walls
For smaller data centers, the easiest way to start deploying AI is to rely on technology vendors. However, there are limits to this approach, namely, the difficulty of dealing with interdependency and business context.
To do its best work, AI needs situational awareness. That’s hard to get for an AI system that’s limited to a single vendor’s product and its functions.
"When I see a spike in my network, or in my utilization of compute, or of power, that could be related to a change in my workforce," Ahmed said. More people could be working from home, for example. It could be due to a major platform upgrade being rolled out – or to something nefarious. "Adding that business context adds a third dimension to complexity.”
Most vendors are still in the early stages of adding AI and ML capabilities to individual products. For example, a product may offer alerts of unusual activity (one of the most common use cases for machine learning), but not much else. More advanced vendors can offer predictive analytics, action recommendations, or even automatic issue remediation.
A more holistic, and effective, approach to AI is domain-agnostic, extracting data from all systems. For the most part, it’s early days in the progress toward that ability.
To start with, there are often organizational obstacles. "It's all in silos," Ahmed said. "There's the network team, there's the infrastructure team managing this, and the operations team managing that. Bringing it all together and using AI and ML to make sense out of it takes time."
It's simpler to deploy AI tools on individual systems, but some organizations are starting to embrace a more centralized approach. "They are changing the way they operate," he said.
Laying the Groundwork Early
Forward-thinking data center managers are designing their systems with AI in mind.
One attractive use case is predicting when a piece of equipment might break early enough to get it fixed or replaced before it brings down the data center.
"Vendors – everybody – have been talking about this nirvana of AI or machine learning to predict when failure is going to happen," said Brent Bensten, CTO of products at QTS Realty Trust, a major US data center provider.
This capability requires the kind of holistic view across disparate systems that is still so hard to get. To determine when a device might break down, you may need temperature data, utilization history, power draw data, and so on.
"Being siloed makes it difficult," Bensten told DCK. "It's not until you can blend systems together with other systems to make them smarter that AI and ML become powerful. That's my opinion."
QTS has been investing in a unified platform for its infrastructure management needs for the past four years. "We take them all, and make them one, and then we can do AI and ML on top of it," he said.