Data Center Rack Density Has Doubled. And It's Still Not Enough
Can solutions like direct-to-chip cooling and rear-door heat exchangers bridge the gap and help data centers meet the overwhelming demands of the AI revolution?
April 15, 2024
One thing is abundantly clear: Every single data center will become an AI data center. And the critical differentiator will be how quickly they can get there.
I've had the pleasure of writing for Data Center Knowledge for some time, and I have also authored the AFCOM State of the Data Center Report for the past eight years. Over those years, I've seen this industry grow and shift to accommodate new trends. However, nothing has been as exciting and ground-moving as what we've seen over the past year. The question is: can rack density keep pace?
To contextualize today’s rack density conundrum, I’d like to take us back. Eight years ago, in our first-ever State of the Data Center report, respondents indicated that their average density was 6.1 kilowatts per rack. Given the typical types of workloads data centers supported, this metric is pretty expected. While high-density applications were a thing, most data centers were still running very traditional applications like e-mail servers, databases, and other business-critical services.
Over the past year, everything has changed. In the 2024 report, respondents indicated that the average density of the rack had increased to 12 kilowatts. Most respondents (60%) are actively working to increase density in their racks (58%), primarily by improving airflow, followed by containment (42%) and liquid cooling (40%). I'm sure you can guess what's driving all of this. Per the report, most respondents (53%) believe new AI workloads (generative AI) will "definitely" increase capacity requirements for the colocation industry.
Here's the crazy part: Even though density has doubled, at least per our report, it's still insufficient to support AI and high-density architecture. To put this in perspective, a single NVIDIA DGX H100 system will consume up to 10.2 kilowatts per rack. Based on our findings, a traditional data center could only support one of these high-end units in their rack despite growth in density. This rapid pace of evolution has become the driving force of innovation in our industry. The most significant difference is that it's happening incredibly fast.
Are Liquid Cooling and Rear-Door Heat Exchangers the Answer?
One of my mentors, Peter Gross, once said, "The data center industry loves innovation as long as it's ten years old." The challenge is that we certainly don't have ten years. We don't even have ten months anymore.
Something must change as it relates to density. We are now asking data center operators to move from supporting 6-12 kilowatts per rack to 40, 50, 60, and even more KW per rack. While airflow and containment are excellent methods to improve efficiency and density, we are quickly reaching the limits of the physics of airflow.
So, the next logical step is to turn to liquid cooling. In full transparency, I've spent the last year working in generative AI and inference, specifically focusing on the data center and colocation space. We've taken data center partners on an extraordinary journey to become more prepared to support generative AI and high-density use cases. The good news is that rack manufacturers, direct-to-chip liquid cooling technologies, and solutions like rear door heat exchangers have come a long way.
According to Vertiv’s post, "Unlike air cooling, which continues to work ever harder, the cooling mechanism of a rear door heat exchanger or direct chip liquid cooling solution produces better cooling results with less work, leading to less energy use and fewer carbon emissions. These technologies could also be used together to drive 100% of the heat load into the fluid."
“The proliferation of increasingly dense AI workloads will usher in the age of liquid cooling,” said Brad Wilson, VP Technology at Vertiv.
“While direct-to-chip cooling will ultimately represent the most significant cooling efficiency increase since the introduction of the PUE metric, rear-door heat exchangers are an effective and energy-efficient solution for medium or high-density applications – including existing air-cooled data centers that are looking for a liquid cooling strategy,” Wilson said.
I'm a big fan of rear-door heat exchangers, which are typically radiator-like doors attached to the back of racks that allow for direct heat exchange or for chilled water or coolant. These are wonderful solutions that can be integrated into traditional colocation architectures. You do not have to rip and replace your entire ecosystem to support high-density architecture. In fact, that's precisely what one data center leader did.
Stepping on the 'GaaS' – Building a Facility to Support Over 3000 H100 GPUs
To support the latest use cases around AI, we must move beyond the hype and look at actual implementations. So, let's look at somebody who has actually executed their vision.
At Data Center World 2024, Ken Moreano, President and CEO of Scott Data, will present on the creation of a data center spanning 110,000 square feet, equipped to handle more than 3,000 NVIDIA H100 GPUs. His talk will cover the transition of a facility from a Department of Defense SCIF (Sensitive Compartmented Information Facility) into a data center that meets the Uptime Institute's Tier III standards, emphasizing the incorporation of technology services that facilitate GPU as a Service (GaaS) and large-scale HPC colocation.
“Our successful operating history, along with our entrepreneurial culture, positioned Scott Data to leverage early market signals for intentional investments in our infrastructure,” says Ken Moreano, President and CEO of Scott Data Center. “This investment demonstrated a solid business case but also reaffirm our longer-term core values of being a technical innovator and leader in our market space.”
In speaking with Ken, we learned that he was motivated by the market and his enterprise clients. Many businesses already have their most critical services with their colocation partners. The next logical step is to support high-density AI workloads. However, the biggest challenge is that these data center partners were not ready or equipped to support high-density workloads. And that's why Scott Data Center went on this journey.
And, after seeing how many data center folks attended NVIDIA's event, I won't be surprised if more facilities rapidly expand their high-density computing capabilities.
Looking Ahead and Executing a New Vision
It's important to note that what we're experiencing as an industry is much more than a technology shift. What we're seeing is a shift in how humanity interacts with data. For the first time, we can ask a question to data and get a "conscious" answer. Original content is generated based on our request. Behind that is an extraordinary amount of computing to create large language models and do inference training. Our facilities will be at the heart of this revolution. Your task will be to find creative, innovative, and sustainable ways to support this new era of digital infrastructure.
Let me give you an example. Recently, manufacturer Vertiv advised us to "Start with the rear doors, then consider direct-to-chip cooling." They’re absolutely right in that approach. Even Nvidia CEO Jensen Huang at the 2024 SIEPR Economic Summit stated that Nvidia's next-generation DGX servers will have liquid cooling.
About the Author
You May Also Like