If the public cloud has an Achilles’ heel, it’s the capability of each of its networks as a whole to behave deterministically and predictably. That’s not always important when storing and recalling names from a list, but when training a machine learning algorithm to recognize a face in a crowd or a virus in a blood sample, determinism could make the difference between pinpoint accuracy and random luck. Ironically, it’s the cloud’s reliance on virtualization, the layer that abstracts software from hardware, that gets in AI’s way.
It’s a shortcoming that the colocation industry has sought to leverage for competitive advantage. In an announcement Tuesday morning, colo leader Digital Realty Trust launched a joint deployment of Nvidia’s DGX servers, running multiple GPUs in parallel, on Core Scientific’s Plexus GPU orchestration platform.
Their objective: make a virtual supercomputer accessible to colo customers worldwide, at the speed of light.
“The immense computing power within the DGX A100 by itself is not enough,” admitted Tony Paikeday, Nvidia’s senior director of product marketing for AI systems, speaking with Data Center Knowledge. “We need infrastructure solutions for enterprises that they can deploy in a turnkey way.”
Virtual Proximity Without the Virtualization
The three partners will begin their launch at Digital Realty’s Cloud House (pictured top), a five-story, 120,000-square foot facility at the center of the U-shaped bend in the Thames that forms the London Docklands. Cloud House is one station in the San Francisco-based data center provider’s “Digital Docklands,” interconnected by its Connected Campus fiber network. The hub of this network is a short walk away, at Sovereign House, a 54,000-square foot data center just across the Thames from the O2 entertainment district.
What they’re seeking to build is in one important respect not a cloud platform, despite being headquartered at “Cloud House.” The compute infrastructure here is bare metal, without the software virtualization layer that typically supports workloads in the cloud, and without the network virtualization layer that would move those workloads dynamically. It also isn’t colocation, at least not directly. Its customers, which may in this case not only include enterprises but also academic and research institutions, are deploying workloads on pre-installed infrastructure in Digital Realty’s facilities.
But it does open up a market for something that certain classes of Digital’s colocation customers might actually need: very high-performance processing, with arguably supercomputer scale, perhaps interconnected with servers in Digital Realty data center space they’re leasing.
“Interconnection is absolutely critical,” asserted Digital Realty CTO Chris Sharp. For example, he told us, a customer with assets in Digital’s 100MW facility in Ashburn could position a single cabinet “right next to the core compute and storage of some of the largest hyperscalers in the world.”
In June, Nvidia reconfigured its own supercomputer with a cluster of 140 DGX A100 units (pictured above) for what it called a “SuperPod,” boosting its performance to achieve the #7 ranking in the Top500 list of the world’s fastest supercomputers.
Maybe organizations don’t need the processing power of a #7 supercomputer — certainly not full-time. But Digital Realty is betting on the possibility that they may perceive proximity to such a system, or even to just a piece of it, as elevating the real estate value of colocation facilities that utilize it. By extension, then, high-speed interconnection through the Connected Campus and Digital’s other services, such as SX Fabric, could become the next best thing to proximity itself.
“It’s a huge problem, a lot of which stems back to the origins of public cloud being very virtualized,” explained Ian Ferreira, chief AI product officer for Bellevue, Washington-based Core Scientific. The containerization model, he believes, is well-suited for general-purpose enterprise workloads “but is a challenge for anything that approximates high-performance computing.”
Bending away from the trend toward high distribution, Ferreira advocates a deployment model that is more “roof-local” and “rack-local.” This model’s aim, he explained, is “to make sure that the data is in the same building, if not on the same rack, as the compute.”
Core Scientific’s customer base includes data scientists at organizations that need to train their AI models in real-time or near-real-time based on high-bandwidth inputs. Nvidia’s Paikeday offered this example: Unmanned drones can be deployed throughout a multi-story manufacturing complex, searching for strange climate conditions, uncatalogued sources of wind or airflow, or incidents of corrosion on metal parts. Training an AI system to spot something suspicious that may come from a drone involves real-time simulated data traffic at the speed drones would transmit it.
Multiple parallel GPUs can process this data in thousands of parallel streams at unfathomable speeds. But there’s still a bottleneck, said Ferreira: populating each of the GPUs in the cluster may yet be a sequential process. There need to be 200 Gbps InfiniBand connections between storage and compute arrays. Otherwise, he said, there’s no point in running the GPUs at or near 100 percent utilization.
“Now, if you do this in the public cloud, the challenges are that you’re in a highly multi-tenant environment, and the performance of said storage with said compute varies based on how many other tenants are in the same facility, and what’s going on in the network,” Ferreira continued. So, having deterministic, dedicated hardware that’s bare-metal takes away that variable performance.”
The Gang Scheduling Paradox
Core Scientific presents its Plexus platform as a kind of cloud for staging and orchestrating highly parallel HPC workloads. In addition to Kubernetes as an orchestrator it offers Slurm, which is a scheduler and workload manager familiar to data scientists who monitor and oversee these workloads on more of a per-processor level rather than an abstract “node” level, treating CPUs and GPUs as distinct entities.
For instance, Slurm utilizes a unique orchestration mode called “gang scheduling.” Here, multiple parallel workloads may be assigned as jobs to a cluster of GPUs simultaneously. The scheduler alternates jobs’ access to processors intermittently, using a time slicing method similar to how UNIX multitasked in the 1970s and ‘80s.
That kind of time slicing control makes extensive use of data swapping in and out of memory. And it’s where the bottleneck that Ferreira mentioned can fold in on itself, compounding the damage.
The solution he clearly argues for is adjacency, bringing storage and compute assets so close to one another, they could hug.
With Connected Campus, explained Digital Realty’s Sharp, “now it may not be within that single building, but we have bulk conduit and multiple dark fiber routes where customers can light up multiple technologies on the shortest path within multiple buildings. Then you have multiple roofs. And you further equip your interconnections to reach out to that edge environment.”
But couldn’t Ferreira’s solution be implemented by just deploying higher capacity in one location?
“I don’t think that’s the case,” Sharp responded. “When you’re shrinking time and distance, you can’t just throw speeds [out there] to overcome these challenges. That’s the underpinning of ‘data gravity:’ You really have to move the physical nature of the infrastructure closer, because no matter how high the throughput or capacity, there’s still a lot of lag, and it really impedes your ability to achieve a deterministic environment.”
Core Scientific’s Ferreira offered this reconciliation: A roof-local solution — for instance, using assets located only in Cloud House — may be employed in training an ML model with a data set. A backend cross-connect moves data from its “cage,” as he puts it, to the compute cluster cage.
“Once that model’s trained,” he suggested, “now I’m going to deploy it on an EGX cluster across multiple edge facilities. We make that possible through a single pane of glass.” A trained data set can then be deployed to multiple interconnected Digital Realty facilities with EGX clusters, including a growing number of smaller-footprint, more remote facilities linked using Nvidia’s Edge AI platform, announced last May.
“Core is enabling [Nvidia] to industrialize this entire workflow,” remarked Ferreira, “and bring AI model development into DevOps. AI model development is done by, as I call them, ‘data science artisans’ — people who are not experts in code, and certainly not in robustness, scale, versioning, accountability, or any of that stuff. The beauty of the Core solution is they can implement this over any infrastructure.”
The joint platform should be made available to Digital Realty and Core Scientific customers in the coming days.