It has been a year since Facebook announced that its Open Compute Project had an initiative focused on defining a network switch that could be used with a variety of operating systems, so that data center operators would not get locked into using a single vendor’s software once they bought that vendor’s hardware.
Facebook’s wish to disaggregate networking hardware from networking software has now been granted. Two switch designs (one by Mellanox and the other by Broadcom) were submitted to Open Compute for approval, and Facebook is already testing a handful of the Broadcom boxes in production in its data centers, Najam Ahmad, director of network engineering at Facebook, said.
Facebook started the Open Compute Project in 2011 as an open source data center and hardware design effort. OCP has since grown into an active ecosystem of vendors and end users focused on web-scale data center infrastructure.
Customizing throughout the stack
The beauty of an OCP switch is that the company will soon be able to deploy network hardware made by a variety of vendors with its own network management software. “The key idea was, if we actually disaggregate, we can mix and match,” Ahmad said. “Buy some and make some. We don’t have to buy a complete vertically integrated solution at this point. That’s really why we’re driving it.”
Given its substantial in-house engineering capabilities, Facebook wants the flexibility to choose between different vendors’ solutions and its own homegrown technology across its entire stack. This allows it to optimize the whole system for its applications and to win on price from competition among vendors. It already uses its own servers and storage arrays, both available as open source designs through OCP, and network gear has been the remaining piece of the puzzle.
Today, the test switches running in production are based on Broadcom’s design, but it does not mean hardware by Mellanox, or Big Switch Networks, or another vendor, will not also be deployed in the future. “We expect a lot more switches to come through the OCP pipeline in that manner,” Ahmad said.
Both Broadcom and Mellanox designs are close to being approved as the official OCP designs, which will mean anybody will be able to manufacture and sell them.
Any OS Facebook’s heart desires
Facebook can use its home-brewed network management software on OCP switches because of one key piece of technology called Open Network Install Environment. A switch with ONIE, an open source boot loader, boots up, finds and loads whatever operating system is available on the network.
The ONIE project was founded by Broadcom, Mellanox, Big Switch, Agema, EdgeCore, Penguin Computing, Quanta and Cumulus Networks. The software was contributed to the Open Compute Project in November of 2013.
SDN for path selection at the edge
Facebook’s network operating system is a Linux variant. The company has a Software Defined Network controller for centralized network management. “We’re very big believers in SDN,” Ahmad said.
One example where SDN helps is selecting optimal network path for data at the edges of the network around the globe. The primary protocol Facebook uses at the edge is BGP, which is good for setting up sessions, path discovery and policy implementation, but not very good at path selection. BGP selects the shortest path and uses it without considering capacity or congestion, Ahmad explained. Facebook’s SDN controller looks at paths BGP discovers and selects the best path using an algorithm that also takes into consideration the state of the network, ensuring the most efficient content delivery.
The network is not making routing decisions at the edge on its own. The decisions are instead made by a central controller and pushed back into the fabric. As a result, Facebook has been able to increase utilization of its network resources to more than 90 percent, while running the application without any packet backlog, Ahmad said.
SDN for bulk Hadoop data transfer
Another example of SDN implementation at Facebook is management of bulk data transfer between data centers. The company’s Hadoop system lives across multiple facilities, which means a lot of traffic is traveling between data centers just for Hadoop. Such transfers, which often involve several terabytes of data, cause substantial congestion in different parts of the network. “You can see congestion for hours at a time,” Ahmad said.
Facebook’s bulk traffic management system enables applications to register what data they need to copy where and over what period of time. It then automatically identifies paths on the network with available capacity and uses that capacity to transfer the requested data. It essentially reshapes inter-data center traffic to avoid congestion that can affect performance of other applications.