Skip navigation
Nvidia DGX A100 supercomputer cluster installed at Argonne National Laboratory to fight COVID-19 Argonne National Laboratory
Nvidia DGX A100 supercomputer cluster installed at Argonne National Laboratory to fight COVID-19

Could Nvidia’s $40B Arm Gamble Get Stuck at the Edge?

What has to happen before its vision of an entirely new data center architecture comes to fruition?

It’s being called the most lucrative edge computing platform presently under development: the “edge cloud” deployment model. This month has already seen major steps forward in the development of that model. First, there was VMware’s launch of Project Monterey, which would extend vSphere virtual infrastructure to disaggregated servers with the latest SmartNICs or Nvidia DPUs.

Last week future Nvidia division Arm added to that news with its renewed investment in Project Cassini, a certification program and accompanying platform begun last year, enabling a standardized software deployment model across Arm-based edge devices.

Yet in a curious way, it may be the edge that’s the problem here. Nvidia is making it clear. It wants its Data Processing Units (DPUs, which are SmartNICs that also serve as workload accelerators) to change the fundamental template of x86 servers, disaggregating data plane and stream processing from the principal logic of the CPU. In other words, it wants another class of processor like the GPU, with Nvidia having a built-in advantage — one that would extend everywhere, not just to the edge.

“The entire data center is software-programmable and can be programmed as a service,” Nvidia CEO Jensen Huang said during a company conference keynote this month taped in his kitchen. “But all the data center infrastructure processing and software is a huge tax on CPUs. As more users load to hyperscalers, each microservice comes with the associated virtualization, networking, storage, and security processing — all of it consuming CPU resources. A new type of processor is needed that is designed for data movement and security processing.”

NvidiaNvidia CEO Jensen Huang, delivering the GTC 2020 keynote from his kitchen.

Nvidia CEO Jensen Huang, delivering the GTC 2020 keynote from his kitchen.

You’d think that would be Arm’s cue. Writing for Seeking Alpha earlier this month, tech analyst Mark Hibben wrote, “In trying to buy Arm, Nvidia is further positioning itself as a new-paradigm semiconductor company. And it would be a marvelous thing for Nvidia if it could execute the acquisition as intended.”

So, here’s a little memo for the folks at Arm: That “B” in the headline doesn’t stand for “million.”

Distance to the Goalpost

While some perceive Nvidia’s $40 billion bid for Arm as a play for the edge, a reverse-angle examination of the deal reveals that this could be Arm’s play for the grand prize: the core of servers, and a secure, virtually permanent place in server reference architectures. But how soon do vendors expect to see an actual rack-scale disaggregated server component standard for the data center? If it’s really a trend, then we should at least know the difference between here and the goalpost.

How soon, we asked a panel of executives at Arm’s own virtual conference last week, can we expect to see a rack-scale standard specifying the framework for a CPU/GPU/DPU disaggregated server system in data centers, a reference similar to, if not directly part of, a framework like OCP?

“That one’s a tough one,” responded Karan Batta, VP of product at Oracle Cloud.  “I think it’s quite a while away. We’re still in the phase where disaggregation is an interesting topic.”

What may eventually decide whether it becomes more than “an interesting topic” is the establishment of a clear line between the needs of applications that would utilize disaggregated architectures and the performance gains they would achieve once the lift-and-shift to the new architecture has occurred, Batta told Arm’s conference attendees. That clear line may point to a break-even point and a cost benefit.

“I think it’s inevitable that we’ll move to disaggregated architectures over time,” responded Kushagra Vaid, Microsoft Azure’s VP and distinguished engineer.  “It has to happen, because that’s the only way you’ll be able to get full efficiency.”

Before that inevitable event can occur, Vaid continued, certain building blocks must evolve further than they have so far. One is the interconnect, the low-level network fabric that would permit the core components of servers to be disaggregated and re-coupled. Security over that interconnect has to improve, he said. That must take place without introducing a level of latency that would wipe out any benefits disaggregation could buy you.

“I think the hard part is going to be to disaggregate memory,” he continued, going for the full hail-Mary pass. The industry may get its first chance to decouple memory from the CPU or any other processor on the bus, when one of the interconnect standards currently vying for prominence becomes the champion in its space, Vaid said. As examples he mentioned Intel’s Compute Express Link (CXL) and IBM’s Coherent Accelerator Interface (CAPI), though he seemed to prefer CXL. That event could take place between 2023 and 2025, he predicted.

Kit Colbert, VMware’s CTO for cloud platform, acknowledged that PCI devices may now be exposed as virtual devices in a virtual infrastructure. Extending that capability to principal components such as memory requires “the right kind of interconnect.” Perhaps IP-based networks could be used instead, but performance issues still abound.

“I’m not sure if it’s four years, I’d say at least three years out,” Colbert predicted.  “From a maturity standpoint, there’s a lot of things that we want to do with SmartNIC before we get to the point — basic things like offloading both network storage and security, bare metal support, etc. And then eventually getting toward that rack-scale architecture, which has a lot of benefit, but it is still a ways out.”

Nitin Rao, Cloudflare’s senior VP for infrastructure, surprisingly suggested the real purpose of disaggregation (which, he explained, is to drive up utilization) could be achieved by other means and probably more expediently.

“If you have very heterogenous workloads, and if you have the operational freedom to not promise customers which server their gear lands on, then it’s largely a load balancing problem. There are other ways to solve this,” Rao said. Between server generations, workloads may find themselves choking on one resource or another. To the extent that choke point is compute, the problem is readily ameliorated simply by improving the CPU.

Put another way, even with Moore’s Law appearing to have fallen into a sinkhole, as Cloudflare perceives it, CPU performance may be improving at a rate fast enough to eclipse any effort to expedite workloads through disaggregation over the next five years.

“The better processors we make,” Rao concluded, “the less we worry about server disaggregation.”

Mystery of the Missing Framework

For Arm’s own part, this week the intellectual property holder for a huge chunk of the world’s processors has been emphasizing the edge: smaller premises with tighter space and power constraints, where CPUs and SoCs with Arm cores are making headway. Its Project Cassini now includes an effort to extend its SystemReady certification process, designed to ensure that software developed for one Arm-based system-on-a-chip will run on every other Arm-based SoC… at the edge.

“Project Cassini is an open, collaborative initiative to ensure a cloud-native experience across a secure, Arm-based edge ecosystem,” stated Augustine Nebu Philips, Arm’s director of segment marketing. It seeks to bridge all edge-class processors by way of a general-purpose operating system, as opposed to the types of real-time operating systems (RTOS) that typically populate the embedded device market space. A list of just the “most popular” RTOS in current use yields 15 distinct systems.

DCK put the question directly to Arm, specifically to Chris Bergey, its senior VP and general manager for infrastructure. How soon should we anticipate a reference architecture for stacking, powering, and cooling disaggregated data server racks, whose processing nodes are chock full of Arm?

“You’re absolutely hitting on the pain points,” Bergey responded. “We’ve talked a lot about how computing has grown many-fold, yet power consumption has stayed relatively flat at the data center level. That’s because we’ve made a lot of these efficiency gains. But there’s a lot more we have to do.”

Arm entered the infrastructure space leading with its message of energy efficiency, he told us. “Honestly, the data center guys were like, ‘We’re plugged into the wall, we don’t really care, we don’t want your little cores — we want big, powerful processors.’ So that’s what we built. That’s what Neoverse is all about.”

Neoverse is Arm’s brand name for its infrastructure-class E1 and N1 cores. Ampere, you may recall, is one company producing a 128-core Altra Max processor using Neoverse cores, aiming to compete toe-to-toe with Intel Xeon and AMD Epyc in the densest servers. There are reference architectures for Neoverse-based processors — there has to be, because Arm doesn’t sell processors but instead licenses out its intellectual property. Arm is a member of the Open Compute Project. And yes, Arm is a participant in the OCP Server Project, which delivers scalable server specifications.

But that isn’t a specification for what Nvidia is aiming towards: an entirely new form factor based around an entirely new class of processor, enabling servers to be split apart and racks to be modularized in a new way. That could change how racks are powered and cooled, especially if the form factors for processor nodes should be cut in half. Besides, a DPU is not a 128-core behemoth; for now it’s a card managed by a processor using just 8 Armv8 A72 cores [PDF].

As for how workloads should be distributed among Arm cores (arguably much more of a software issue than a hardware-configuration one), that’s a matter for Project Cassini. Yet, as Bergey acknowledged, that project focuses on an edge-oriented platform, not the larger data center.

Finally, we posed the question to Arm spokespeople officially: Is there (or will there be) a project (perhaps parallel to Cassini or Neoverse) to specify how a rack full of Nvidia’s dream Arm chips would be assembled and managed? We are told an answer is forthcoming. We may not be the only ones waiting.

TAGS: Design ARM VMware
Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish