When Microsoft realized that the software-defined networking it was using to give Azure customers virtual networking wouldn’t keep up when 40GB and 100GB NICs arrived, it took the same FPGAs running Bing in its data centers and turned them into smart network cards. The company had been using expensive CPUs for the SDN, which it would much rather have running cloud VMs it could charge for.
Running the virtual switch on the NIC rather than the server freed up CPU cores for customer workloads and reduced network latency tenfold, according to the cloud provider. The same SmartNICs could take over tasks like cryptography operations, QoS, and RDMA storage acceleration.
Amazon designed similar SmartNICs based on custom Arm chips. Its Nitro system now includes a range of hardware that delivers everything from storage acceleration to replacing UEFI to give VMs performance that’s closer to that of bare metal. The Nitro Card for VPC handles routing, network-packet encapsulation and decapsulation, implements EC2 security and supports the kind of custom network acceleration typically done on supercomputers.
SmartNIC vendors like Mellanox promise similar benefits for enterprise data centers. Mellanox’s BlueField SmartNICs virtualize network storage for faster provisioning, speed up AI workloads by accelerating network traffic, or reduce performance impact of security protocols.
It can handle network virtualization, including offloading DPDK-style packet processing with ASAP2 (Advanced Switching and Packet Processing). JD.com uses ASAP2 to monitor traffic for denial of service attacks in the network using a BlueField SmartNIC without having to send malicious packets to the CPU for analysis (which would compromise performance exactly the way an attacker would want). BlueField can also accelerate storage networking options like RDMA, NVMe over Fabrics (NVMe-oF), compression, and encryption.
More Like Cloud
That’s how data centers can get the kind of performance improvements a new generation of servers used to deliver, Kevin Deierling, Mellanox VP of marketing, told Data Center Knowledge.
“We're not going to have faster and bigger computers at the same pace that we used to,” he said. “We're just going to have more and more of them, and we're going to build giant clusters.” As the systems scale out, the interconnect piece needs to speed up. This is why hyperscalers went to 25GB, then to 50GB, and then to 100GB, he explained. “They saw these problems, they saw the massive datasets. They realized they weren't scaling, and that they needed to invest not in these new CPUs but in interconnect. If you look at where their innovation is today, it's on the interconnect side.”
Most application performance gains will no longer come from the linear Moore’s Law increases in CPU power but from network accelerators, Deierling went on. “Wouldn't it be nice if all the GPUs in your cloud could look like they were local? Wouldn't it be nice if all the NVMe flash drives in the cloud looked like they were local? We could do that with the SmartNIC.”
Some data center users already see the appeal. When Futorion asked 200 enterprise IT pros worldwide what approaches are most important for boosting data center performance, improving network efficiency with SmartNICs and processor offload was top of the list, closely followed by making application code more efficient. They expected SmartNICs to improve the efficiency of virtualization for compute, storage, and networking, and they were the top choice for getting more out of hyperconverged infrastructure. Throwing more hardware at the problem (by upgrading network bandwidth or deploying more severs) was the least popular idea.
But how much of the benefits hyperscale cloud platforms get from SmartNICs can you expect to see in your own data centers? Given that only 17 percent of those in the survey wanted to adopt the automated infrastructure deployment, management, and monitoring that makes hyperscale cloud work, while twice as many wanted to get the same virtualization efficiency as cloud offers, they may be underestimating how ready they are to take advantage of SmartNICs.
Deierling’s definition of a SmartNIC is “a combination of a C-programmable engine and a bunch of intelligent accelerators that use standard APIs.” Whether you’re programming that with Mellanox’s ASAP2 or extended Berkeley Packet Filters, it requires a certain level of expertise.
If your organization has the right skills, you can get the same kind of acceleration in your data centers as you can with Nitro on AWS (which requires more esoteric programming tools like Libfabric or the Nvidia Collective Communications Library). But you’re not going to get the same agility as the Azure networking team, which uses FPGAs in the SmartNICs rather than the Arm chips that both AWS and Mellanox chose.
“All the SmartNICs in Azure are upgradeable, and we upgrade them every quarter with new functionality,” Yousef Khalidi, corporate VP for Azure networking, told us. That’s not just updating the tables of rules Azure uses for transforming packets in its virtual network, but the configuration of the SmartNIC itself, as Microsoft discovers more efficient ways to run the hardware that makes up the network card.
Mellanox does offer an FPGA-based SmartNIC, the Innova, but primarily as a way of getting new functionality faster than it can design and ship a new ASIC. You can upgrade BlueField with new acceleration by using ASAP2 or eBPF, but you can’t keep upgrading the way the SmartNIC itself works the same way Azure can. And the Azure networking team hasn’t run out of new ideas for taking advantage of that reprogrammability yet, Khalidi said.
Smart Enough for SmartNICs?
“The use of SmartNICs is a topic that is just beginning to gain interest,” Ovum analyst Roy Illsley said. “The hyperscalers have the technical resources and scale to take full advantage [of them], so cloud is where this technology has first been seen to evolve.”
But the approach is now sparking interest in the world of corporate data centers, as workloads become more complex, running on a combination of different technologies, such as VMs, cloud-native, legacy hardware, and so on, Illsley explained. Enterprise data center operators find themselves needing more control over their network traffic and the ability to offload work from server CPUs.
Also driving interest in SmartNICs is the rise of edge computing and separation of network data plans from network control planes to deal with federated company data assets. “SmartNICs offer a potential benefit in security of these more complex environments,” he said.
Only operators of the biggest data centers are considering or adopting SmartNICs today – Deierling called them “tier-two cloud enterprises” – but Illsley suggested the potential market could be larger and not just for better data center performance but also to improve operational efficiency. “The network in a cloudy world is the key to connecting all the elements and making them operate correctly and securely.”
How much larger will depend on how much easier to work with and even easier to explain SmartNICs can get. “It needs the technology to be simplified and its value proposition to be more clearly articulated in business terms. It is still too geeky.”