This month we focus on data centers built to support the Cloud. As cloud computing becomes the dominant form of IT, it exerts a greater and greater influence on the industry, from infrastructure and business strategy to design and location. Webscale giants like Google, Amazon, and Facebook have perfected the art and science of cloud data centers. The next wave is bringing the cloud data center to enterprise IT... or the other way around!
Following examples set by other web-scale data center operators, companies like Google and Facebook, the infrastructure engineering team behind the professional social network LinkedIn has designed its own data center networking switch to replace networking technology supplied by the major vendors, saying the off-the-shelf products were inadequate for the company’s needs.
LinkedIn has successfully tested its first-generation switch, which consists of contract design manufacturer hardware, merchant silicon, and the company’s home-baked Linux-based software, and plans to deploy it at scale for the first time in its upcoming data center in Oregon, in an Infomart Datacenters facility, which will also be the first site to use LinkedIn's own data center design.
LinkedIn’s data center switch, called Pigeon, is a 3.2Tbps (32 by 100G) platform that can be used as a leaf or a spine switch. The architecture is based on the Tomahawk 3.2Tbps merchant silicon. It runs a Linux OS.
Base architecture of Pigeon, LinkedIn's first 100G data center switch (Image: LinkedIn)
Google was the first web-scale, or cloud data center operator to create its own switches. Facebook got into designing its own networking technology several years ago.
After introducing Wedge, its first 40G top-of-rack data center switch, in 2014, Facebook rolled out its own data center switching fabric and an aggregation switch, called Six Pack. Last year, the company announced it had designed its first 100G switch, which it plans to start deploying at scale in the near future.
Companies like Facebook, Google, Microsoft, or Amazon, have built global data center infrastructure of unprecedented scale, finding along the way that technology that exists on the market often doesn’t meet their needs and creating solutions in-house that work better for them.
LinkedIn’s user base has been growing rapidly, and the company is starting to face similar challenges. It went from 55 million members at the end of 2009 to nearly 400 million as of the third quarter of last year, according to Statista. Its 10MW data center lease with Infomart earned it a spot on the list of 10 companies that leased the most data center space last year.
Its network latency problems started three years ago, and after spending some time trying to address them, the engineers realized that they would have to build a data center networking solution from scratch.
“We were not scaling our network infrastructure to meet the demands of our applications – high speed, high availability, and fast deployments,” Zaid Ali Kahn, LinkedIn’s director of global infrastructure architecture and strategy, said in a blog post. “We knew we needed greater control of features at the network layer, but we hit a roadblock on figuring out how.”
The team traced its latency problems to subsequent microbursts of traffic. These were difficult to detect because commercial switch vendors don’t expose buffers inside third-party merchant silicon chips. Visibility into merchant silicon became one of the design goals for Pigeon.
LinkedIn’s data center network vendors were also slow to address software bugs and built features into their products the company didn’t need. Some of those features also had bugs LinkedIn had to address.
The engineers also wanted to use Linux-based automation tools, such as Puppet and Chef, and more modern monitoring and logging software. Finally, it was simply too expensive to scale switching software licenses and support. All these concerns echo the reasons other web-scale data center operators have given for turning to custom technology, designed in-house.