ScaleMatrix and Nvidia's New AI and HPC Appliance Doesn't Need a Data Center

AI Anywhere models crunch 8 petaFLOPs and 13 petaFLOPs inside self-sufficient racks, thanks to proprietary cooling tech.

Christine Hall

November 19, 2019

4 Min Read

Cabinets and chilling unit for AI Anywhere HPC computer

The DDC cabinets and the micro-chiller that are part of the AI Anywhere HPC appliance that can be operated at any location.ScaleMatrix

Just two months ago we wrote, "One does not simply buy a supercomputer." But in the rapidly changing world of IT, what was true in September may no longer be true in November.

Today at SC19, the supercomputing conference being held in Denver, data center provider ScaleMatrix introduced appliances that can deliver up to 13 petaFLOPS of performance. With cooling built in, they're plug-and-play out of the box and don't need to be housed in a specially designed data center.

This could be a game changer. Most high performance computing systems required for running machine learning and other AI workloads can't be located in a typical data center without major modifications to the facility's power distribution and cooling system. Being GPU-intensive, HPC systems can bring rack density up to about 30kW per rack, at least five times higher than the average data center load of 3kW to 5kW per rack.

But ScaleMatrix's new appliance is self-sufficient.

"All we need is a roof, floor space, and a place to plug the appliance in, and we can turn on an enterprise-class data center capable of supporting a significant artificial intelligence or high performance computing workload," Chris Orlando, ScaleMatrix's co-founder and CEO, told DCK.

Called "AI Anywhere," the product was developed in a three-way collaboration between ScaleMatrix, which operates high-density colocation data centers for AI workloads in Houston and San Diego, chipmaker Nvidia, and Microway, a provider of computer clusters, servers, and workstations for HPC and AI. They're available in two single-rack versions, each employing one of Nvidia’s two DGX supercomputer models, designed specifically for machine learning and AI workloads.

One model contains 13 DGX-1 units, delivering a payload of 13 petaFLOPS, with the other containing four DGX-2 systems, delivering 8 petaFLOPS. Both units adhere to DGX-POD reference architecture designs (Nvidia's design for building GPU-accelerated AI devices) and include the full NVIDIA DGX software stack, deep learning and AI framework containers, NetApp ONTAP storage, and Mellanox switching.

"Any enterprise that wants to be a supercomputing enterprise could have never imagined deploying the scale of infrastructure that they can support now with this solution," Tony Paikeday, director of product marketing of AI and deep learning at Nvidia, told us. "Prior to this they would have needed an AI-ready data center, the kind of facility that is optimized for the power and cooling demand of these accelerated computing systems. Now you can literally drop a supercomputing facility in places that would have been unimaginable before."

The secret sauce that makes these plug-and-play supercomputers possible is in the cooling.

With the DGX-1 version consuming 42kW, and the DGX-2 version running at 43kW, there's more heat being generated than most well-equipped data centers can handle coming out of a single rack. The appliances use ScaleMatrix's proprietary closed-loop chilled water-assisted forced-air cooling system -- the same design cooling ScaleMatrix's data centers -- with chilled water circulating through racks designed and built by the ScaleMatrix subsidary DDC.

A separate "micro-chiller" unit sits next to the rack, cooling water for the AI Anywhere systems.

Not only can this hybrid air-and-water approach cool efficiently -- it uses in-rack sensors to direct cool air where it's needed -- it does so without the risk associated with bringing liquid directly to silicon, Orlando said.

"Where the water comes in and out and does the thermal exchange, that area is sealed off from the rest of the cabinet," he said, "so we're bringing all the efficiency of water cooling to the cabinet without introducing any of the risk."

And the design can cool much higher densities than those of the two DGX-based solutions, he said. DDC recently introduced a rack that can handle up to 85kW using the same cooling system.

According to Paikeday, AI Anywhere addresses a need that Nvidia has been observing for a while.

"Customers are deploying larger and larger infrastructures to either tangle with more complex AI problems like natural language processing, or they're doing a consolidation play of trying to take stranded AI platform investments, kind of like 'Shadow AI,' that are sprawling across their enterprise and bring them under one roof," he said. "The question that inevitably comes back from most of these customers is, I'd love to do this but I'd have to have a data center and I'm getting out of the data center business -- I'm not trying to put more CapEx back into my data center.

"So this is now a perfect way to remove that last-mile barrier of how to get this kind of computing power into their hands."

The devices will be marketed only as AI Anywhere and won't carry the logo of any of the partners as a master brand, although the individual components used in the appliance will branded.

"The cabinets will be branded DDC. Microway is the delivery and services partner, and Nvidia, NetApp, and Mellanox infrastructure will each have their own logos. The ScaleMatrix cabinet exterior will be marked with AI Anywhere," ScaleMatrix said in response to our query.

About the Author(s)

Christine Hall

Freelance author

Christine Hall has been a journalist since 1971. In 2001 she began writing a weekly consumer computer column and began covering IT full time in 2002, focusing on Linux and open source software. Since 2010 she's published and edited the website FOSS Force. Follow her on Twitter: @BrideOfLinux.

See more from Christine Hall

Related Topics

Recent in Infrastructure

Related Topics

Recent in Build & Design

Related Topics

Recent in Ops & Mgmt

Related Topics

Recent in Business

Related Topics

Recent in Security

Related Topics

Recent in Next-Gen

Related Topics

Recent in Sustainability

Related Topics

ScaleMatrix and Nvidia's New AI and HPC Appliance Doesn't Need a Data Center

About the Author(s)

Editor's Choice

Industry Voices

Featured How Tos

Related Topics

Recent in Infrastructure

Related Topics

Recent in Build & Design

Related Topics

Recent in Ops & Mgmt

Related Topics

Recent in Business

Related Topics

Recent in Security

Related Topics

Recent in Next-Gen

Related Topics

Recent in Sustainability

Related Topics

<span class="ArticleBase-LargeTitle">ScaleMatrix and Nvidia's New AI and HPC Appliance Doesn't Need a Data Center</span>ScaleMatrix and Nvidia's New AI and HPC Appliance Doesn't Need a Data Center

About the Author(s)

Editor's Choice

Industry Voices

Featured How Tos

ScaleMatrix and Nvidia's New AI and HPC Appliance Doesn't Need a Data Center