It’s no secret that between air and water, water is the more efficient medium for cooling the mightiest of servers. Using liquid to carry heat away from computer chips is a common data center cooling method in the world of supercomputers, but today, as some internet-based services develop a more complex set of backend capabilities, such as Big Data analytics or machine learning, data centers that host them are taking cues from supercomputing facilities.
One example is eBay. A special unit within Dell that makes custom tech for operators of the world’s largest data centers, has designed a water-based system for cooling custom server chips it developed together with Intel Corp. and eBay itself.
The system is different from typical liquid cooling solutions, however. It brings water from the facility’s cooling towers directly to every chip inside server chassis. There are no central distribution units, which typically sit between cooling towers and server racks in liquid-cooled data centers.
Another atypical aspect of the design, codenamed Triton, is its ability to use water that’s warmer than usual. The system, now deployed at one of eBay’s data centers, uses water that’s 33 degrees Celsius – because the customized CPUs run at very high frequency – but if Triton is used with lower-power CPUs, supply water temperature can be as high as 60 degrees Celsius, Austin Shelnutt, principal thermal engineer at Dell, said.
Cranking up the CPU
The eBay processor that needs all the cooling it can get is a modified version of the chips in Intel’s latest Xeon E5 v4 family. The highest-performing off-the-shelf part in the family has 22 cores and thermal design power (TDP) of 145W. eBay’s chip, modified to run at higher frequency, has 20 cores and TDP of 200W.
Intel has been designing custom chips for hyperscale data center operators like eBay, Facebook, and Amazon for several years, and this business really ramped up starting about three years ago. Some of it was driven by cloud providers, such as Amazon, who wanted to launch more and more powerful types of cloud servers.
Rack-Scale Architecture, 21-Inch Form Factor
Triton is a rack-scale architecture, meaning all components that go into a rack, including servers, are designed holistically as a single system. This is a different architectural approach developed specifically for hyperscale data centers.
A rack filled with servers being cooled by Dell's liquid cooling system Triton in a lab (Photo: Dell)
Vendors have traditionally focused on designing individual self-sufficient boxes, be they servers, storage units, or network switches. In rack-scale architecture, basic resources, such as power, cooling, or network connectivity, can be shared among the nodes for efficiency, and components like memory cards or CPUs can be swapped out individually, without the need to replace entire servers.
Triton uses 21-inch-wide server chassis, similar to the racks and chassis Facebook developed and open sourced through the Open Compute Project. It is not, however, an OCP design, Shelnutt pointed out.
Facility Water Directly to Chip
Water in the system travels from the cooling tower to the rack, and individual pipes bring it inside every chassis and to a cold plate that sits on top of every CPU. The only pumps in the system are the facility pumps that push water between the cooling towers and the building.
Copper pipes carry facility water directly to cold plates on top of CPUs inside every server chassis (Photo: Dell)
The absence of additional pumps between mechanical chillers and computer room air handlers in air-cooled data centers and central distribution units in traditional liquid-cooled data centers makes for a very energy efficient system. “Energy required to cool CPUs in the rack is zero,” Shelnutt said.
Triton’s Power Usage Effectiveness, or PUE, is 1.03, according to internal analysis by Dell. That’s compared to the industry average data center PUE of 1.7, according to the most recently available data from a 2014 survey by the Uptime Institute.
Dell addressed the common worry about bringing water into expensive electronics with “extreme testing” of the welded copper pipes, using high-pressure simulations in which it pumped water at more than 350 PSI. The system’s normal supply-water PSI is 70.
Each server, chassis, and rack has a leak-detection mechanism and an emergency shut-off device. The system borrows a dripless disconnect design from military applications, according to Dell.
Dell's liquid cooling system Triton features a dripless disconnect design used in military applications (Photo: Dell)
Hyperscale Doesn’t Always Mean High Density
Not all hyperscale data center operators shoot for high power density like eBay does. Some of them – Facebook, for example – prefer highly distributed low-density systems working in parallel. In an earlier interview, Jason Taylor, Facebook’s VP of infrastructure, told us that the average power density in the social network’s data centers is around 5.5kW. It may have some servers that require 10kW or 12kW per rack and some that only take about 4.5kW, but the facilities are designed for low power density overall.
“We definitely see it both ways,” Shelnutt said. “We have customers on both sides of that fence.”
Where a company falls on this low-to-high-density continuum depends a lot on the nature of its application. If the application requires a lot of compute close to dense population centers, for example, the company may be inclined to go for higher density, because real estate costs or tax rates may be higher in those areas, he explained. There is a long list of factors that affects these design decisions, and “not every customer derives the same benefit from tweaking the same knobs.”
Shelnutt’s colleague Jyeh Gan, director of product marketing and strategy for Dell’s Extreme Scale Infrastructure Unit, said their team is starting to see more demand for high-frequency, high-core-count CPUs and alternative cooling solutions among hyperscale data center operators.
Shelnutt and Gan’s unit, whose name is abbreviated as ESI, was officially formed about six months ago. It’s tasked with developing custom data center solutions for hyperscale technology companies.
Those customizations may be as simple as adding extra SSD slots to an existing server design or as involved as designing an entire data center cooling system, Gan said.
But the unit’s focus is solely on the biggest of customers. Something like eBay’s Triton is not available to any customer off-the-shelf, but if a company with a big enough requirement wants it, the ESI unit is where it would turn.
Corrected: The custom Intel Xeon chip cooled by the Triton system has 20 cores, not 24 as the article previously stated. Also, Triton's PUE is 1.03, not 1.06.