The rise of machine learning is driving up power densities inside data centers of companies that deploy servers filled with GPUs and other types of accelerator processors. Well into the neighborhood of 30kW to 50kW per rack, these densities are driving some operators to turn to liquid instead of air for cooling.
While some data center operators use liquid cooling to make their facilities more efficient, most of the growth in adoption of these technologies is driven by the need to cool higher-density racks.
But the switch from air to liquid isn’t simple. Here are some of the biggest barriers to adoption of liquid cooling in data centers:
1. Two Cooling Systems Instead of One
It rarely makes sense for an existing data center to convert to liquid cooling one rack at a time. The facilities team will have to manage two cooling systems instead of one, Lex Coors, chief data center technology and engineering officer at Interxion, the European colocation giant.
That makes liquid cooling a better option for new data centers, or ones that are due for a major overhaul.
There are always exceptions. That’s especially true for hyperscalers, whose unique infrastructure problems often require unique solutions.
For example, Google is currently converting many of the rows in its existing data centers to liquid cooling to deal with power density of its latest machine-learning processors, TPU 3.0.
2. No Standards
Lack of industry standards around liquid cooling is a major obstacle to broad adoption of the technology.
"The customer, first of all, has to come with their own IT equipment ready for liquid cooling," Coors said. "And it's not very standardized -- we can't simply connect it and let it run."
Interxion doesn't currently have any customers using liquid cooling, but the company is ready to support it if necessary, according to Coors.
Many liquid-cooling solutions rely on dielectric liquid, which doesn’t conduct electricity and presents no risk of electrocution. But some do use chilled, or warm water.
"If you touch it, and it starts leaking at that moment, you would get electrocuted," Coors said. "But there are ways around it."
As with any system that involved water flowing through pipes, corrosion is an issue in liquid cooling.
"Corrosion in those small pipes is a big issue, and this is one of the things we are trying to solve today,” Coors said.
Manufacturers are improving the pipes to reduce the risk of leaks, and to have the pipes seal automatically if a leak does happen.
Meanwhile, the racks themselves can be containerized, he added. "So if you have a leak, it would only splash the water on that rack, nothing else."
5. Operational Complexity
Perhaps the biggest risk is that of increased operational complexity, said Jeff Flanagan, executive VP at Markley Group, a provider of colocation and cloud computing services that’s planning to roll out liquid cooling in a high-performance cloud data center early next year.
"As a data center operator, we prefer simplicity," he said. "The more components you have, the more likely you are to have failure. When you have chip cooling, with water going to every CPU or GPU in a server, you're adding a lot of components to the process, which increases the potential likelihood of failure."
There’s also a whole other set of complicating factors when operating a data center where servers are submerged in dielectric fluid, which we covered earlier.