Traditionally reserved for mainframes and academic supercomputers, liquid cooling may soon be seeping into more enterprise data centers. New, more demanding enterprise workloads are pushing up power densities, leaving data center managers looking for more efficient alternatives to air-based cooling systems.
We’ve asked a number of data center opearators and vendors about the applications that are driving liquid cooling into the mainstream. Some of them didn't want to disclose specific applications, saying they viewed those workloads and the way they’re cooled as a competitive advantage.
Hyperscale cloud operators, including Microsoft, Alphabet’s Google, Facebook, and Baidu, have formed a group working on an open specification for liquid-cooled server racks without saying that exactly they would use them for. At least one category of workloads in the hyperscalers’ arsenal, however, clearly calls for liquid cooling: machine learning systems accelerated by GPUs, or, in Google’s case, also TPUs, which the company has said publicly are now cooled using a direct-to-chip liquid cooling design.
Despite operators’ caginess around this subject, some usage trends are starting to emerge. If you're supporting any of the following workloads in your data center, liquid cooling may be in your future too:
1. AI and Accelerators
In recent years, the annual CPU performance growth described by Moore’s Law has slowed. Partially for that reason, accelerator processors – mainly GPUs, but also FPGAs and application-specific ASICs – are increasingly making their way into enterprise data centers.
GPU-powered machine learning may be the most common use of hardware acceleration outside of the HPC realm. However, about a third of IT service providers told 451 Research in a recent survey that they were planning to accelerate systems for online data mining, analytics, engineering simulation, video, other live media, fraud detection, load balancing, and similar latency-sensitive services.
Hardware accelerators have much higher thermal design points (TDP) than CPUs and typically need 200W or more of cooling; add in a high-performance server CPU and you can see individual systems that need more than 1kW of cooling.
Intel is also going over the 150W limit it’s traditionally designed server processors to. “More and more people want more powerful chips, and we’re starting to see the watts creeping up,” Andy Lawrence, an executive director at the Uptime Institute, told us.
Rack density is rising. Most data centers Uptime tracks now have at least some racks that are over 10kW, and 20 percent have a rack that is 30kW or higher. But those loads aren’t considered high-performance computing. “They just say they have a higher-density rack for their workloads,” Lawrence said.
“If people are putting a GPU in alongside an Intel processor, they’ll probably get three times the density they were previously getting,” he said. Liquid cooling is an obvious fit for these accelerators, especially immersion cooling, which can cool both GPUs and CPUs.
2. Cooling High-Density Storage
Storage density continues to increase, and cooling storage efficiently can be hard. Much of the installed storage capacity in data centers consists of non-sealed hard disk drives, which cannot be cooled with liquid. Newer technologies, however, are more promising in that respect. Solid State Drives, for example, can be cooled with full-immersion solutions. Additionally, creating the helium atmosphere supporting high-density, high-speed read/write heads in the latest generation of storage hardware requires the units to be sealed, also making them suitable for liquid cooling.
As the 451 report notes, the combination of SSDs and helium-filled HDDs means there’s no need to separate air-cooled storage from liquid-cooled processing. There’s an additional benefit of improved HDD reliability, as immersing drives in cooling fluids will reduce the effects of heat and humidity on components.
3. At the Edge
The need to reduce latency for current and future applications is driving demand for a new generation of data centers on the network edge. These can be densely populated remote facilities at wireless towers, on factory floors, or retail stores. It’s possible that more and more of them will host high-density compute hardware, such as GPU-packed clusters for machine learning.
While not all edge data centers will be liquid-cooled, many will be designed to use it to support those heavy workloads in confined spaces, where traditional cooling options won’t be available, or in new deployments, where there’s no prerequisite to use legacy cooling. Because it lowers energy consumption, liquid cooling makes it easier to deploy edge sites in locations where high-capacity power feeds aren’t available.
In Lawrence’s estimate, as much as 20 percent of edge data centers could use liquid cooling. He envisions lights-out micro-modular high-density sites supporting 40kW per rack.
4. High Frequency Trading and Blockchain
Many modern financial service workloads are computationally intensive, requiring high-performance CPUs, as well as GPUs. These include high-frequency trading systems and blockchain-based applications like smart contracts and cryptocurrencies.
GRC, formerly called Green Revolution Cooling, has one high-frequency trading firm testing its immersion-cooling solution. The vendor also saw its biggest spike in sales ever, when it introduced an immersion-cooling product for cryptocurrency mining, and when bitcoin price surged beginning in late 2017.
Another GRC customer in Trinidad and Tobago is running a cryptocurrency service at 100kW per rack, with a warm water-based cooling loop connected to an evaporation tower, GRC CEO Peter Poulin told us. Because warm-water cooling is more energy efficient than cold-water cooling, the service can run in tropical conditions with no mechanical chillers.
5. Cooling is Expensive
Liquid cooling doesn’t only make sense when an air-based system cannot handle the density.
Geosciences company CGG uses GRC’s immersion systems to cool a Houston data center where it does seismic data processing on commodity servers with powerful GPUs that consume up to 23kW per rack. That’s relatively high, but that kind of density is often cooled with air. “We put heavy compute into the immersion tanks for cooling,” Ted Barragy, manager of CGG’s advanced systems group, said. “But it’s not so much about the application workload as the economics of immersion.”
Immersion cooling replaced legacy cooling equipment in an old CGG data center during an upgrade. According to Barragy, the team recovered almost a megawatt of power capacity as a result of the upgrade. Even after a couple of years of adding servers and immersion tanks, “we still have half a megawatt of power we haven’t been able to use,” he said. “This was an old legacy data center, and half the power was going into inefficient air systems.”
PUE of the immersion-cooled data center is about 1.05, Barragy said. That’s better than another, newer but air-cooled CGG data center in Houston, whose PUE is 1.35.
“Many people think this is just a solution for really high-density 60kW to 100kW per rack computing, but for our mainstream customers there are other significant benefits,” Poulin said.
Uptime Institute CTO Chris Brown said he was seeing interest in liquid cooling for general workloads rise. And that’s being driven by the promise of higher energy efficiency and lower operating costs.
“The conversation is not around super high-density, but just cooling that [data center managers] can use for any of their IT assets,” he said. “It’s getting into more common-density solutions and more run-of-the-mill data centers.”