Liquid cooling can be more efficient than air cooling, but data center operators have been slow to adopt it for a number of reasons, ranging from it being disruptive in terms of installation and management to it being simply unnecessary. In most cases where it does become necessary, the driver is high power density. So, if your data center (or parts of it) has reached the level of density that calls for liquid cooling, how will your day-to-day routine change?
Depending on how long you’re been working in data centers, liquid cooling may seem brand new (and potentially disturbing) or pretty old-school. “Back in the 80s and 90s, liquid cooling was still common for mainframes as well as in the supercomputer world,” Chris Brown, CTO of the Uptime Institute, says. “Just being comfortable with water in date centers can be a big step. If data center managers are older, they may find it familiar, but the younger generation are nervous of any liquid.”
There’s often an instinctive reluctance to mix water and expensive IT assets. But that concern goes away once you understand the technology better, because in many cases, the liquid that’s close to the hardware isn’t actually water and can’t do any damage.
Modern immersion and some direct-to-chip liquid cooling systems, for example, use dielectric (non-conductive) non-flammable fluids, with standard cooling distribution units piping chilled water to a heat exchanger that removes heat from the immersion fluid. “That allows them to have the benefits of liquid cooling without having water right at the IT asset … so that if there is a leak, they’re not destroying millions of dollars’ worth of hardware,” Brown explains.
Impact on Facilities
In fact, he says, data centers that already use chilled water won’t get much more complex to manage from switching to liquid cooling. “They’re already accustomed to dealing with hydraulics and chillers, and worrying about maintaining the water treatment in the piping to keep the algae growth down – because if the water quality is low, it’s going to plug the tubes in the heat exchangers.” The water loop that cools immersion tanks can run under an existing raised floor with needing extra structural support.
If you don’t have that familiarity with running a water plant because you’re using direct expansion air conditioning units, Brown warns that liquid cooling will require a steeper learning curve and more changes to your data center operations. But that’s true of any chilled-water system.
Impact on IT
How disruptive liquid cooling will be for day-to-day IT work depends on the type of cooling technology you choose. Rear-door heat exchangers will require the fewest changes, says Dale Sartor, an engineer at the US Department of Energy’s Lawrence Berkeley National Laboratory who oversees the federal Center of Expertise for Data Centers. “There are plumbing connections on the rear door, but they’re flexible, so you can open and close the rear door pretty much the same way as you did before; you just have a thicker, heavier door, but otherwise servicing is pretty much the same.”
Similarly, for direct-to-chip cooling there’s a manifold in the back of the rack, with narrow tubes running into the server from the manifold and on to the components. Those tubes have dripless connectors, Sartor explains. “The technician pops the connector off the server, and it’s designed not to drip, so they can pull the server out as they would before.”
One problem to watch out for here is putting the connections back correctly. “You could easily reverse the tubes, so the supply water could be incorrectly connected to the return, or vice versa,” he warns. Some connectors are color-coded, but an industry group that includes Microsoft, Facebook, Google, and Intel is working on an open specification for liquid-cooled server racks that would introduce non-reversible plugs to avoid the issue. “The cold should only be able to connect up to the cold and the hot to the hot to eliminate that human error,” Sartor says.
Adjusting to Immersion
Immersion cooling does significantly change maintenance processes and the equipment needed Ted Barragy, manager of the advanced systems group at geosciences company CGG, which has been using GRC’s liquid immersion systems for more than five years.
If your server supplier hasn’t made all the changes before shipping, you may have to remove fans or reverse rails, so that motherboards hang down into the immersion fluid. For older systems with a BIOS that monitors the speed of cooling fans, cooling vendors like GRC offer fan emulator circuits, but newer BIOSes don’t require that.
Networking equipment isn’t always suitable for immersion, Barragy says. “Commodity fiber is plastic-based and clouds in the oil.” In practice, CGG has found that network devices don’t actually need liquid cooling, and its data center team now attaches them outside the tanks, freeing up space for more compute.
While CGG had some initial teething troubles with liquid cooling, Barragy is confident that the technology is reliable once you understand how to adjust your data center architecture and operations to take advantage of it. “The biggest barrier is psychological,” he says.
Gloves and Aprons
To replace components like memory in a server dipped in a tub of coolant, you have to remove the whole motherboard from the fluid – which is expensive enough that you don’t want to waste it and messy enough that you don’t want to spill it – and allow it to drain before you service it.
Barragy recommends wearing disposable nitrile gloves and remembers spilling oil down his legs the first time he worked with immersed components. Some technicians wear rubber aprons; others, who have more experience, do it in business-casual and don’t get a drop on them. “Once you’ve done it a few times, you learn the do’s and don’ts, like pulling the system out of the oil very slowly,” he says. “Pretty much anyone that does break-fix will master this.”
A bigger difference is that you’re going to be servicing IT equipment in a specialized area off the data center floor rather than working directly in the racks. You might have to replace an entire chassis and bring the replacement online before taking the original chassis away to replace or upgrade the components, Brown suggests.
“You want to do break-fix in batches,” Barragy agrees. His team will wait until they have four or five systems to work on before starting repairs, often leaving faulty servers offline for days, with failed jobs automatically requeued on other systems. To speed the process up, he recommends having a spare-parts kiosk.
There are relatively few suppliers of liquid-cooled systems to choose from, and until systems based on the upcoming open specification for liquid-cooled racks are on the market, you can’t mix and match vendors. “There is no interoperability,” Lawrence warns. “There are ten or 15 suppliers of immersive cooling and fewer of direct-to-chip [systems], and they tend to partner up with a hardware supplier. That means the ability to just choose the hardware you need is very limited once you're locked into a cooling provider.”
On the other hand, you don’t have to redo complex airflow dynamics calculations or figure out how to spread load across more racks if you want to increase power density. You can just switch from a 20kW to a 40kW tank and keep the same coolant and coolant distribution units.
Returns get somewhat more complicated and best done in batches. “If you’ve got some broken components, you're going to let those drip dry for a couple of days,” Barragy explains. “They'll have an oil film on them, but you’re not going to wind up with a puddle of mineral oil on your hands.” Vendors who design motherboards for use in immersion systems will be comfortable dealing with components coming back in this condition, and CGG is able to process systems that reach end of life through their normal recycling channels.
Liquid cooling may mean extra work, but it also makes for a more pleasant working environment Scott Tease, executive director of high-performance computing and Artificial Intelligence at Lenovo’s Data Center Group, says. Heat is becoming a bigger problem than power in many data centers, with faster processors and accelerators coming in ever-smaller packages.
That means you need more and more air to move inside each server. “The need for more air movement will drive up power consumption inside the server, and will also make air handlers and air conditioning work harder,” he says. Not only will it be hard to deliver enough cubic feet per minute of air, it will also be noisy.
The break-fix and first-level-fix IT staff at CGG now prefer to work in the immersion-cooled data center rather than the company’s other, air-cooled facility, which is state-of-the-art. “Once you learn the techniques so you don’t get the oil all over you, it’s a nicer data center, because it’s quiet and you can talk to people,” Barragy said. “The other data center with the 40mm high-speed fans is awful. It’s in the 80dB range.”
Liquid-cooled data centers also have more comfortable air temperature for the staff working inside. “A lot of work in data centers is done from the rear of the cabinet, where the hot air is exhausted, and those hot aisles can get to significant temperatures that are not very comfortable for people to work in,” Brown says. “The cold aisles get down to pretty cold temperatures, and that's not comfortable either.”