While water is better at absorbing and transporting heat than air, the prevalent heat-transfer medium in the computing industry has been air. That’s because it’s less complicated and less expensive to build and maintain an air-based data center cooling system; because closely coupling water loops with what often amounts to millions of dollars’ worth of computing equipment makes some people nervous; and (perhaps primarily) because air has worked just fine for your average enterprise server.
Yes, there’s High Performance Computing, the subset of the industry where liquid cooling has been a mainstay for decades. But supercomputers, whose designers pack as many high-performance CPUs and GPUs into every server chassis as they possibly can, represent a relatively small portion of the market.
According to the hardware maker Lenovo (and others), we’ll soon start seeing the various forms of liquid cooling make their way into a wider variety of data centers, well beyond the HPC realm, driven by the rise of compute-heavy machine learning applications, big data analytics, and virtualization. Lenovo is betting on this trend by investing in development of multiple liquid cooling designs, expecting more customers to use them.
Servers Increasingly Power-Hungry
Processors are getting more and more powerful. Even though the companies that design them are good at extracting a lot of computing power out of every watt of electricity, newer chips still consume more power than they ever have before. Other components that surround the chips on the motherboard – things like memory, storage devices, and IO cards – are also getting more powerful and more power-hungry, Scott Tease, executive director, HPC and AI, at Lenovo, told Data Center Knowledge in an interview.
As the trend toward more and more powerful servers continues, it’s going to be increasingly difficult to cool them with air. “Anyone that’s going to push a server hard is going to run into these kinds of problems as you go forward,” he said.
Lenovo expects CPUs to push above 240 watts and co-processors (such as GPUs) to cross 300 watts. CPU-only servers, the company said, will push past 1kW.
On Tuesday, Lenovo unveiled two designs, one of which helps bring the efficiency of liquid cooling to data centers without the disruption of ripping and replacing the entire cooling system. “We give people a very easy way to bring water in. No data center changes needed,” Tease said. The second technology is much more invasive but designed for maximum energy efficiency.
‘A Very Special Heat Sink’
The non-disruptive innovation is essentially “a very special heat sink,” Tease said. It’s a heat sink that’s filled with liquid and, according to Lenovo, allows for higher-density processors, while reducing the amount of airflow necessary to cool the server.
In most two-CPU servers, air travels over one CPU before traveling over the other. By the time it makes it to the second chip in this “shadowed CPU” design, it’s already warmer, heated by the first one. Lenovo’s new heat sink, called Thermal Transfer Module, uses the liquid inside it to carry the heat away from both CPUs to a portion of the module with a lot of area to disperse the heat over before it’s expelled.
Expected to be available toward the end of this month as an option when buying Lenovo’s high-density servers, the TTM will add no more than $25 dollars to each server’s price, Tease said. It can cool 205w CPUs using liquid without the need to build an additional water loop in the data center.
Warm Water Cooling
The overall idea behind the second cooling design isn’t new. It’s using warm water to cool servers by bringing the water directly to the chips.
It’s energy efficient because it can cool 45kW per rack (“and potentially well beyond that”) using water that’s 50 degrees Celsius. At that temperature, the system, called DTN (Direct to Node), doesn’t need energy-guzzling mechanical chillers to cool the water further.
The idea of doing away with chillers by bringing warm water directly to the chips, relying on water’s natural heat-transfer properties, has been around for some time. eBay, for example, uses a similar system, designed by Dell, in one of its data centers.
At the data center of its customer LRZ, the Leibniz Supercomputing Center outside of Munich, Lenovo took the design further by adding the capability to capture heat energy from the water in the cooling loop and create additional cooling capacity.
Not all devices in the data center, which houses about 6,500 servers and 100 petabytes of storage, can be water-cooled; those are things like storage and networking equipment. That means the facility needs some amount of air conditioning.
To supplement the room’s traditional air-conditioning system, warm water carrying heat from the servers (at this point about 60C) is piped into an absorption chiller. The absorption chiller evaporates the water, causing it to lose the heat, and then pressurizes it to turn it back to liquid. Since the waters is already warm going in, it doesn’t take a lot of energy to vaporize it, Tease explained.
More Liquid Cooling Expected to Flow Into Enterprise
Most of Lenovo’s liquid cooling customers are still in the HPC space, Tease admitted, but he expects this to change. The company announced its water-filled heat sink and its warm-water cooling design as parts of a three-product liquid cooling portfolio it called Neptune. (The third product is a rear-door heat exchanger, at this point a common approach to cooling high-density racks, including in the AI space).
The fact that Lenovo unveiled the Neptune portfolio in the run-up to ISC, the big annual HPC industry conference kicking off in Frankfurt this Sunday, also points to the outsize role supercomputers still play in the market for liquid-cooled data centers.
But Lenovo isn’t the only company saying that the use of high-density compute systems is spreading beyond HPC.
More powerful hardware is allowing companies to put more VMs on a single physical host than ever before. There’s pressure on data center operators to consolidate workloads and squeeze as much value out of every box they buy and support as they possibly can, so even traditional enterprise workloads are driving up power density.
While the big cloud providers, such as Microsoft Azure and Google Cloud Platform, are reporting high demand for their GPU infrastructure services for AI, Nvidia, which supplies most of those GPUs, says it’s been seeing a lot of deployments of this kind of infrastructure by enterprises in their own data centers as well.
Google is already using liquid to cool its own custom AI chips, called Tensor Processing Units, or TPUs, and if the systems enterprises deploy on-premises reach similar power levels, it’s not a stretch to imagine that the demand for liquid cooling in the enterprise will grow.