When Cavium first unveiled what it called Octeon Fusion processors in 2011, they were Arm-based devices designed to function as “base stations-on-a-chip” for telcos. A 4G LTE base station capable of serving up to 300 simultaneous users, could for the first time fit into a form factor about the size of a PC’s graphics card. Octeon is what made Cavium what it was.
Then network equipment maker Marvell acquired Cavium in 2017, immediately launching speculation as to how long Cavium’s IP would last in Marvell’s equipment. The answer was actually longer than many folks thought. On Monday, Marvell announced it would begin sampling four SKUs of Octeon 10, its latest generation of network processor, in the second half of this year. (The pandemic, with its continuing effect on supply channels, has made more precise times harder to pin down.)
With this generation, Marvell has opted to forego the use of custom-designed Cavium cores with Arm technology, placing its best instead all-in on Arm Neoverse N2 cores, which Arm announced in early May.
De-inventing the Wheel
“Do we continue to invest in something that Arm is very good at?” asked John Sakamoto, Marvell’s vice president for its infrastructure processor business unit, recalling the question that he says his company began asking around the table two years ago.
“If you’re looking at what they’re doing now with Neoverse,” Sakamoto continued, “they’re targeting that infrastructure market, and they’re doing very good at optimizing that performance power. And we looked at what they were doing versus what we could engineer ourselves, and we’re like, ‘Arm can do it better than we can. Why use that resource? Let’s just use the Arm core.’”
That’s the opposite decision announced a few weeks ago by Ampere. That company made the case for transitioning to its own design, as it begins adopting the 5 nm lithography node now being championed by fabricators such as Taiwan’s TSMC. Marvell is citing Octeon’s adoption to the 5 nm node as part of its logic for switching to Neoverse N2 cores.
“With this combination of N2 plus TSMC, we really focus on the best performance-per-watt,” said Sakamoto. The objective there, he said, was to bring the energy consumption below a certain heat threshold, making it feasible for components built on Octeon 10 to be passively cooled.
“There’s no active cooling in these use cases in the RAN [radio access network],” he explained. “You’ve got to have passive cooling, and you can run a 400-gigabit DPU into these use cases. Put aside the costs, the power — you have to be 60 W or below to run in these cabinets that have no active cooling.”
One can argue that few enterprise data centers will operate under the same climate and power constraints as an edge, or edge-style, micro data center alongside a 5G RAN. But in a way, that’s the point: If a passively cooled processing unit, with an integrated 1-terabit switch using 16 simultaneous 50-gigabit Ethernet ports, can be squeezed into a little box and sip 60 W or less, the case for parallel 250-watt-plus CPUs that tend to run the same functions repeatedly but not quite as well, falls apart pretty precipitously.
In its initial configurations, Octeon 10 will be available in four configurations, with 8, 24, 24, and 36 Neoverse N2 cores, respectively. All four configuration will support their own, on-board DDR5 memory controllers, indicating that an industry-wide move to shared system DRAM won’t be happening all that soon. The top-of-the-line DPU400 will support 12 DDR5 controllers, each rated at 5200 MT/s — 5.2 billion transfers per second.
It won’t all be just for raw data transfers, we’re told. As network equipment gains more and more ability to detect potential points of failure through real-time analysis, DPUs will need to become stand-alone inference engines. Rather than centralizing all the AI on one core, a 5G network will need to distribute its analysis power throughout all nodes in the RAN.
If you think of this scenario like how the Apollo space program ended up revolutionizing factory production, you can see where this could lead enterprise data centers: DCIM, such as we have come to know it, could be rendered obsolete by components capable of handling their own healing, as well as participating collectively in a distributed management scheme. Centralized management could end. . . and there would be much rejoicing.
A DPU or an IPU?
In the meantime, however, there may still be some confusion to clear up. While manufacturers such as Marvell and Nvidia move their SmartNIC designs towards a platform they call “DPU,” Intel is moving forward with what it calls an “IPU”, that is already absorbing what it used to call SmartNICs.
“That announcement [from Intel] is important for a couple of reasons,” remarked Marvell’s Sakamoto. “First is, they admitted publicly that cycles are going to come off the main compute node. That’s a huge departure. . . It’s super-important that they acknowledge that a DPU exists, and it will exist, and this is inevitable. Cycles are coming off the processor.”
Sakamoto is a nearly four-year veteran of Intel, and before that, spent 21-and-a-half years at FPGA accelerator maker Altera, up until Intel acquired it in 2015. From his perspective, the two categories are essentially the same, regardless of the choice of initial letter. “It will be very unusual to see an Octeon 10 sitting next to an FPGA on a SmartNIC. That’s not going to happen. It’s either/or. We will try to do the same functions.”
Perhaps Intel will continue trying to expand the IPU market category, Sakamoto believes, maybe by leveraging its new processor’s ties to the Xeon D class — which its SmartNICs already had. But once system builders are given design choices, he believes, they will evaluate IPUs and Octeon DPUs using direct comparisons.
Marvell will be making available a developer platform unit for Octeon 10, mounted on a PCIe 5.0 card, in the fourth quarter of this year.