Microsoft Azure is running more and more Linux workloads. About 40 percent of the IaaS cloud VMs are Linux instances, and that’s not counting the popular Azure Kubernetes Service. At this rate, it’s easy to forget that the company’s hyperscale cloud is built on the same Windows Server OS that runs in many of the world’s non-Azure data centers.
And the company constantly tweaks both OSes Azure supports. Earlier this month at its Build conference in Seattle, Azure CTO Mark Russinovich shared some of the ways it fine-tunes Windows and Linux to improve performance and reliability of the cloud platform.
The Azure OS is a minimal, stripped-down version of Windows Server Core 2016. It runs a minimal set of server roles, and all unused features and drivers are disabled. All language packs except English are removed, and only native x64 code runs on the host OS.
But it also has some extra features to allow hot-patching and updates without disruptive reboots. “We’ve added some enhancements over time to improve our ability to roll out patches to the hypervisor and the virtualization stack without causing impact to customer VMs,” Russinovich said.
The more of the OS you might need to replace, whether to fix bugs or to add new features, the longer the update will take. The slowest option is a technique Microsoft has patented under the name Virtual Machine Preserving Host Update, or VM-PHU. It pauses the VMs and replaces the whole OS.
“We freeze the virtual machines in RAM, we reboot the operating system with a new version underneath them, and then we resume the virtual machines,” the CTO explained. That takes about 15 seconds, but he expects that number to drop below 10 seconds – or fast enough for the network connection not to be affected – soon.
“There are other versions [of this technique] where you freeze the virtual machine and update just parts of the OS and virtualization stack and cause less downtime,” he said. The “light” version of VM-PHU replaces the whole virtualization stack except the hypervisor, updating the networking and storage stack in about 3 seconds.
The “ultra-light” version is much faster (within a second) and it works with VMs that have direct device assignment – such as GPUs – without needing changes in guest-device drivers or applications. Before suspending the VMs in memory, it pauses and queues device interrupts and disconnects the devices; after the update, it reconnects the device before restoring the VM state, resumes the VM, and unpauses device interrupts.
“You can update everything except one small piece of the virtualization system and the hypervisor,” Russinovich said. “Anything else on the server you can update, and what makes this really cool is this works independently of how big the server is.”
Hyper-V, the Windows hypervisor, can also now be updated with hot-patching without any downtime. “We’ve used hot-patching in production in other parts of the virtualization stack about 50 times in the last year or so and seen no impact to customers at all,” Russinovich said.
Hot-patching works by replacing functions in running binaries with new code that fixes whatever bug you’re patching. The processors are paused and interrupts are disabled; the new code is inserted by the hot patch; then the processors are unpaused and interrupts are enabled. This works with both kernel and user mode binaries.
Speeding Up Linux VM-to-VM Connectivity
To speed up networking for Linux VMs, Azure is starting to use DPDK, the Data Plane Development Kit. It is a set of data plane libraries and NIC drivers (created by Intel and now sponsored by the Linux Foundation) for faster packet processing. It can offer high throughput and network performance even when the network is virtualized.
DPDK offers low enough latency to create network appliances like load balancers and application gateways in the cloud, or to speed up throughput between VMs in a multi-VM workload. A10 Networks has demonstrated 30Gbps connectivity between Azure VMs using DPDK.
DPDK runs as a software driver in the kernel of a Linux VM. Instead of using interrupts to process packets coming from the network it uses polling. “Everyone thinks polling is bad, but not when you're processing millions of packets per second; in that case, you want to poll and then get huge batches [of packets] to process and not be interrupted every time something new comes in,” Russinovich pointed out.
The standard latency between two VMs in a single Azure region is about 190 microseconds. The FPGA-accelerated smart NICs that drive Azure’s new Accelerated Networking option take that down to 30 microseconds, but with poll-mode DPDK drivers that drops to just 6 microseconds.
“We're seeing amazing jumps in performance and latency from what we have with the standard network stack in the OS,” Russinovich said, noting that you’d typically need more expensive and complex network hardware to achieve this. “You're getting down into custom InfiniBand-type network performance for network latencies – and this is without RDMA. This is over standard Ethernet inside the data center.”
DPDK isn’t Azure-specific; developers will be able to write network appliance applications that can run on other platforms, such as AWS or OpenStack. Russinovich predicted even higher performance with later versions of DPDK. Any Azure improvements will be contributed back to the Linux kernel, he said.