Mellanox InfiniBand Director switch Mellanox
Mellanox InfiniBand Director switch

For Nvidia, $7 Billion Is What It Takes to Dominate AI Hardware

Here’s why the huge Mellanox price tag Nvidia has agreed to is probably worth it

The battle to dominate the AI computing market is heating up. As regular CPU performance gains shrink, this battle is increasingly playing out in the interconnect portion of the stack, the different technologies used to move data between processors, memory, storage devices, servers, and entire data centers.

This week, Nvidia announced a power move in this battle. The company best known for its graphics processors, which power video games, cryptocurrency mining, supercomputers, and deep neural networks (the dominant type of computing architecture for AI), said it will acquire Yokneam, Israel, and Sunnyvale, California-based Mellanox Technologies, the leading maker of low-latency, high-bandwidth interconnect technologies.

At $6.9 billion, it will be the largest acquisition Nvidia will have ever made. The Santa Clara, California-based chipmaker expects to close the deal by the end of the year. Its largest publicly disclosed purchase to date has been the $367 million Icera deal in 2011, an Nvidia spokesman told The Wall Street Journal.

The Mellanox acquisition is about strengthening Nvidia’s position in the data center market, which is currently responsible for about one-third of its revenue. Its data center business spiked in recent years, driven primarily by growth in deep learning and cryptocurrency mining. Nvidia’s GPUs have enjoyed widespread use in both categories. But sales of crypto mining hardware took a nosedive in the fourth quarter (following the trajectory of the value of digital currencies), and the company’s overall revenue growth stalled.

On a call with reporters Monday, Nvidia founder and CEO Jensen Huang said this was only a “temporary pause” for the company’s data center business, which has otherwise been growing faster than its other businesses. “The strategy behind this [Mellanox deal] is basically doubling down on data center,” he said.

The massive price tag appears to have been necessary for Nvidia to win a bidding war to gain control of technology crucial to its strategy, which now revolves around its data center business. Multiple news sites reported that competitors including Intel and Xilinx had been coveting Mellanox. The reports relied on anonymous sources.

Asked Monday whether it was true that Intel had been one of the bidders, Huang said, “I don’t know in the end exactly who – because there were a lot of rumors – but it was very competitive.” Asked the same question via email, an Intel spokesperson said, “Intel isn’t commenting on the market rumors.”

Why Mellanox Is a Big Deal in AI Hardware

Two capabilities have made Mellanox – namely its InfiniBand and Ethernet product lines – so successful in the HPC and deep learning markets, Chirag Dekate, senior director and analyst for AI infrastructure and HPC at Gartner, told Data Center Knowledge in an interview. Both product lines enable RDMA (remote direct memory access), where one server in a cluster can use memory of another server without involving their operating systems (a key capability for huge parallel-computing clusters). And, they enable switch processors to handle network operation tasks that are otherwise handled by CPUs. This frees up CPU capacity to handle other tasks and helps avoid interconnect bottlenecks caused by too much “CPU-bound activity” in large clusters.

Intel buying Mellanox would have been bad news for Nvidia, which in NVLink and NVSwitch has its own powerful interconnect technologies for scale-up computing architectures (where processing power is increased within a single system) but relies on Mellanox for scale-out architectures (where processing power is increased by adding more server nodes to a computing “fabric”).

The role of scale-out computing architectures is growing in the deep-learning space, as companies start to build larger and larger deep neural networks, Dekate said. Nvidia’s GPUs are the dominant hardware accelerators that underpin these networks, and the company is betting on its ability to hold onto its dominant position. Intel buying Mellanox would likely force Nvidia to change the way it approached scale-out, Dekate said.

Most companies’ AI efforts are currently in development and pilot phases, Dave Driggers, CEO of Cirrascale, which sells access to HPC and AI hardware in its own data centers as a cloud service, told us. Deep learning is so new, most users aren’t scaling out the hardware platforms underneath those projects, he said. But, as those efforts start moving toward production, scale-out architectures will become more important.

Intel owning Mellanox’s interconnect technologies in addition to its upcoming AI processors and GPUs would put Nvidia and others at a big disadvantage, Driggers said. The “Nvidia purchase was as much defensive as it was offensive,” he said. “Intel buying Mellanox was not going to be good for Nvidia.”

Intel’s own scale-out interconnect technology is called Omni-Path, a result of technology acquisitions from Cray and QLogic. But Omni-Path’s market share in the HPC interconnect space is small compared to InfiniBand and Ethernet. According to Driggers, there’s also been little interest in Omni-Path among builders of hardware systems for deep learning. “Unless you’re a pure Intel shop, there’s no interest in it,” he said.

A ‘Two-Prong’ Attack on the Data Center Market

If and when Nvidia and Mellanox close the deal, the chipmaker will have a powerful computing stack for deep learning, HPC, and big data analytics. The combination will make it successful with hyperscale cloud platforms and other cloud-native companies, but the chipmaker will have to make a serious effort to make it palatable for more traditional enterprises as well, Dekate said.

“They need to make sure that this capability is not perceived as a niche capability in the market,” he said. Some AI workloads will be executed in hyperscale clouds, but there will continue to be a class of end users who have to keep these workloads on-premises, in their own data centers, either for compliance reasons or simply because of “data gravity,” when the volume of data a company has is so large, it’s impractical to move it to the cloud.

Driggers said he didn’t see any potential downsides to the acquisition for Nvidia and Mellanox customers like Cirrascale. When the word had gotten out that Mellanox management were shopping the company around, he and some of his colleagues tried to guess who the buyer would end up being, he recalled. Intel, Xylinx, AMD, IBM, and Broadcom were all on their list, but not Nvidia, he said. The announcement surprised them, as it did others. (Nvidia has made few acquisitions outside its core graphics processing space and never a deal of such magnitude.)

“I think it was expensive for Nvidia,” Driggers said. But if it closes the deal, the company will have full control of premier interconnect technologies for scale-up and scale-out and its premier accelerators. “It gives them a two-prong approach for holding onto the data center, which is what they want to do,” he said.

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish