Here’s How Microsoft Keeps the Cost of Its Network Backbone Down

Highly dynamic network management allows the company to defer expensive fiber redeployments.

Mary Branscombe

March 2, 2018

6 Min Read
Marea submarine cable running into the ocean in Virginia Beach, Virginia
Marea submarine cable, funded by Facebook, Microsoft, and Telxius, running into the ocean in Virginia Beach, Virginia.Run Studios/Microsoft

Optical networks are expensive. A cloud provider like Microsoft spends tens of thousands of dollars just on optical equipment to deliver every 100Gbps increment in speed (leaving aside the cost of buying or leasing fiber). Microsoft’s US backbone network alone needs hundreds of terabytes per second bandwidth between 35 cities spread over thousands of miles of fiber. Worldwide, that adds up to hundreds of millions of dollars building and interconnecting data center networks for cloud services. As bandwidth needs go up, optical costs have to come down.

What a cloud operator needs from an intercity fiber network is simpler than the requirements of a typical tier one telco; with point-to-point segments that only carry packet-based traffic, capacity and coherent transmission are more important than mesh connectivity or managing contention. On the other hand, the mix of network traffic on the Microsoft backbone – Office 365, Azure, OneDrive (cloud storage), Bing, and a range of other workloads – takes a lot of planning and prioritization. Enterprise applications need near-perfect availability, high capacity, and the flexibility to keep up with unpredictable customer demand. Replicating storage between Azure data centers generates a lot of traffic, but it can be planned for and scheduled around higher-priority traffic.

Related:Microsoft Runs a Simulation of the Entire Azure Network to Prevent Outages

Rather than taking the usual approach to increasing capacity by adding more wavelengths and lighting up more fiber, the Azure network team explained at the recent ACM Workshop on Hot Topics in Networks how instead they started looking for ways to use the fiber already laid more efficiently, to carry more bits.

Thanks to advances in Dense Wave Division Multiplexing (which assigns incoming signals to specific light frequencies and then amplifies all the signals down a single fiber) with reconfigurable multiplexers that can switch wavelengths between ports very efficiently, elastic optical networks can utilize spectrum that wouldn’t be available with the usual fixed configuration. Combine that with bandwidth-variable transceivers and traffic engineering using software-defined wide area networking (Microsoft SWAN) that allows Azure engineers to mix and match different form factors of transponder linecards and high-density interconnects, and Azure will be able to keep using the fiber it has laid for much longer before having to roll out new cabling.

Changing Modulations

Back in 2015, Microsoft started looking at just how much more capacity its fiber could deliver by collecting the signal quality of all the 100Gbps line cards for its entire North American fiber backbone every 15 minutes and comparing that to a 4,000 km test network in Microsoft Research labs. Fiber networks usually stick to a fixed modulation for a fixed capacity; a 100Gbps line is always a 100Gbps line. But could the fiber actually handle more data?

Related:Here’s How the Software-Defined Network that Powers Azure Works

Looking at the first three months of data, the team found that by configuring the modulation of each channel with bandwidth-variable transmitters they could get 70 percent more network capacity on the same fiber cables. That increase was a mix of 100, 150, and even 200Gbps speeds, with most of the traffic running at 150Gbps. Making the speed increases more granular -- using 25Gb rather than 50Gb increments -- used the fiber even more efficiently, delivering another 86 percent of capacity, with most of the traffic reaching 175 or 200Gbps (and a smaller proportion running at 150 and 225Gbps).

A few months of network data could be misleading; it might measure unusually low traffic or an odd mix of workloads, so Microsoft collected two and a half years of data. Based on that, 99 percent of the 10Gbps segments in the Microsoft backbone in North America could run at 150Gbps without switching out the fiber or even the intermediate amplifiers, and just by changing the modulation format. In fact, 80 percent of the links could run at 175Gbps and 34 percent could even run twice as fast at 200Gbps.

To get the extra 145 terabytes per second that adds up to across the network, Azure started buying bandwidth-variable transceivers that can switch between 100, 150, and 200Gbps depending on the signal-to-noise ratio of the fiber path – much the way Wi-Fi or DSL connections speed up and slow down depending on the quality of the network.

Collecting the network data showed that the signal quality of particular network links usually stays the same, except for dips caused by problems with either the fiber or the optical hardware. Those dips are why most fiber is overprovisioned; it only takes a signal-to-noise ratio of 6.5dB to carry 100Gbps of traffic, but Microsoft’s 100Gbps links typically had 12dB – which is why they could carry up to twice as much traffic.

The data also showed that failures don’t increase much at 175Gbps, but they do become a problem at 200Gbs, and some of those failures could last for hours (which is why Azure picked transceivers that can vary their modulation rather than just cranking the speeds up permanently).

Annoyingly, those transceivers have to power down the network link to change the modulation of the connection, which takes an average of 68 seconds, and looks like a failure in the network. Most of that time is spent turning the laser that sends the optical data back on; without turning off the laser, changing the modulation would take more like 35 milliseconds – so if modulating the bandwidth of a fiber link to match the signal is going to become common, transceivers will need to be designed to help network operators maximize capacity rather than for the cautious operators who overprovision fixed capacity lines to avoid the high latency of changing link speeds.

There’s another reason for wanting to make those changes; often, dropping to a lower bandwidth would let you keep using a fiber link instead of having it stop working completely. The majority of fiber network failures don’t mean the fiber isn’t working at all; cut fibers and power failures are relatively rare (and the most common reason for a fiber link being down is planned maintenance like replacing a line card).

Looking at seven months’ worth of unplanned failures in the Azure fiber network, in 90 percent of those cases the signal-to-noise ratio on a ‘failed’ link was still over 3dB – which is enough to run 50Gbs. So being able to switch link speed on demand could improve availability as well as capacity – although designing a network traffic controller to deal with those changes makes them more complicated to run.

To help with that, Microsoft has turned its network measurement and data analysis research into a real-time performance and failure monitoring engine for the optical layer of the Azure network (something few network tools look at). It’s also looking at how to feed details about link signal quality to IP traffic engineering algorithms like SWAN and Google B4, so they can take advantage of dynamic capacity links.

The changes Microsoft has asked network vendors for in the past, to improve network speed and reduce the cost and power consumption of optical networking for Azure, have shown up in commercial systems from Cisco, Arista, INPHI and ADVA. Dynamic fiber connections are still a complex option even for cloud providers, but further down the line they could prove to be a way of increasing the capacity and availability between your own data centers at lower cost than leasing new fiber.

Subscribe to the Data Center Knowledge Newsletter
Get analysis and expert insight on the latest in data center business and technology delivered to your inbox daily.

You May Also Like