After it customized servers and storage to optimize for its applications and to enable its developers to roll out new software features at lightning speeds, networking switches were the remaining component of Facebook’s infrastructure that was a “black box,” with tightly coupled vendor-designed proprietary software and hardware.
This will no longer be the case at Facebook. The company has designed its own top-of-rack switch and a Linux-based operating system for it, both of which Jay Parikh, its vice president of infrastructure engineering, previewed at GigaOm Structure this morning.
Companies with data center infrastructure of Facebook’s scale, such as Google, Amazon and Microsoft, design their own hardware because the approach results in better performance of their specific applications than off-the-shelf products offer. It also saves them a lot of money since they can source the hardware from multiple competing manufacturers and don’t have to pay the high margins incumbent hardware vendors tack onto their products.
Last year, Facebook’s open source hardware design initiative Open Compute Project started work on a switch that would work with any operating system. Earlier this month, Facebook’s director of network engineering Najam Ahmad told Data Center Knowledge that the company was already testing a handful of OCP switches in production in its data centers.
The switch Parikh announced this morning was a separate effort, outside of OCP. Facebook does, however, plan to open source the design through the initiative, he said.
Disaggregation of the network
The main idea behind the box, nicknamed “Wedge,” is disaggregation of individual components. Facebook has pursued disaggregation across its entire infrastructure stack. It enables the company to upgrade individual server components, such as CPUs or network interface cards, instead of ripping and replacing the entire box. It also allows for easy configuration of machines to optimize for different purposes.
Wedge gives the company’s networking hardware the same chameleon properties. Both the hardware and the software are split into modules that can be mixed and matched, and Facebook can innovate and upgrade each of these components one at a time.
The central modular hardware feature of Wedge is the OCP “Group Hug” motherboard, originally developed for microservers. A single Group Hug board can accommodate Intel, AMD or ARM processors.
Besides configurability, using a server board makes the box behave less like a switch and more like a server. “It basically takes the switch and it turns the switch into … another server,” Parikh said.
The advantage of that is the ability for Facebook to provision and manage them using the same Linux-based operating environment it uses to provision and manage its server fleet. “This enables us to deploy, monitor and control these systems alongside our servers and storage, which in turn allows our engineers to focus more on bringing new capabilities to our network and less on managing the existing systems,” Facebook’s Yuval Bachar and Adam Simpkins wrote in a blog post.
OS turns software engineers into network engineers
The company designed FBOSS, the operating system for Wedge, to leverage the software libraries and systems it uses for managing the server fleet. These include provisioning, decommissioning, upgrades, downgrades, draining and undraining.
Because it can program switch hardware Facebook can implement its own forwarding software faster. The company’s Software Defined Network controller makes routing decisions centrally to ensure optimal data delivery around the world.
FBOSS also puts an abstraction layer on top of the ASIC APIs of the switch which enables its engineers to treat it like they treat any other service in Facebook. “With FBOSS, all our infrastructure software engineers instantly become network engineers,” Bachar and Simpkins wrote.