This month, we focus on the open source data center. From innovation at every physical layer of the data center coming out of Facebook’s Open Compute Project to the revolution in the way developers treat IT infrastructure that’s being driven by application containers, open source is changing the data center throughout the entire stack. This March, we’ll zero in on some of those changes to get a better understanding of the pervasive open source data center.
The perfect data center masks the complexity of the hardware it houses from the requirements of the software it hosts. Compute, memory, and storage capacities are all presented to applications and services as contiguous pools. Provisioning these resources has become so automated that it’s approaching turnkey simplicity.
This is the part of the story where the open source movement stands up, takes a bow, and thanks its various supporters, agents, and parents for making everything possible for it. To say that open source efforts are responsible for the current state of data center networking would be like crediting earthquakes for the shapes of continents. Yes, they play an undeniably formative role. But the outcome is more often the result of all the elements — many of them random — they put into play.
One of these elements is a trend started by virtualization — the decoupling of software from the infrastructure that supports it. Certain open source factions may be taking credit for the trend of disaggregation now, but the next few years of history may record it as something more akin to a gathering storm.
“A very fundamental architectural principle that we believe in is, first of all, we want a platform in the future that allows hardware innovation and software innovation to grow independently,” said Equinix CTO Ihab Tarazi, in a keynote address to the Linux Foundation’s Open Networking Summit in Santa Clara, California, earlier this month.
“I don’t think the industry has that today,” Tarazi continued. “Today, if you innovate for hardware, you’re still stuck with this specific software platform, and vice versa; and all the new open source software may not have... a data center, without customized adoption in specific areas by service providers. So what we want to create in our data center is the platform that allows the new explosion of hardware innovation that’s coming up everywhere, in optics and switching and top-of-rack — all of that, to have a home, to be able to connect to a platform independently of software. And also we want all the software innovation to happen independently of the hardware, and be able to be supported.”
From Many Routes to One CORD
It isn’t so much that open source, in and of itself, is enabling this perfect decoupling of hardware from software that Tarazi envisions. Moreover, it’s the innovation in data center automation and workload orchestration happening within the open source space in just the past three years that is compelling the proprietors of the world’s largest data centers to change their entire philosophy about the architecture, dynamics, and purpose of networks. Telecommunications providers especially now perceive their networks as data centers — not just in the figurative sense.
“We want to scale out [the network] in the same way that we scale out compute and storage,” explained AT&T SDN and NFV engineer Tom Anschutz, speaking at the event. It’s part of AT&T’s clearest signal to date that it’s impressed by the inroads made by Docker and the open source champions of containerization at orchestrating colossal enterprise workloads at scale. But it wants to orchestrate traffic in a similar way, or as similar as physics will allow, and it wants open source to solve that problem, too.
Last June, AT&T went all-in on this bet, joining with the Open Networking Lab (ON.Lab) and the Open Network Operating System (ONOS) Project to form what’s now called Central Office Re-imagined as a Datacenter (CORD, formerly “Re-architected”). Its mission is to make telco infrastructure available as a service in an analogous fashion to IaaS for cloud service providers.
Anschutz showed how a CORD architecture could, conceivably, enable traffic management with the same dexterity that CSPs manage workloads. Network traffic enters and exits the fabric of these re-imagined data centers using standardized interfaces, he explained, and may take any number of paths within the fabric whose logic is adjusted in real-time to suit the needs of the application.
“Because there’s multiple paths, you can also have LAN links that exceed the individual thread of capacity within the fabric,” he said, “so you can create very high-speed interfaces with modest infrastructure. We can add intelligence to these types of switches and devices that mimic what came before, so control planes, management planes, and so forth can be run in virtual machines, with standard, Intel-type processors.”
Disaggregation, Anschutz demonstrated, leads to an easier path between “then” and “now,” by isolating each of the physical elements and functions of the current infrastructure and mapping them to their virtual, SDN-based equivalents. For example, routers and switches map to SDN-controlled virtual appliances; and backplanes map to top-of-rack and end-of-row switches. Theoretically — and optimally — wide varieties of current-day network devices would map to a single software-based equivalent that represents the entire class.
“If I did this just for one box, I’d be a fool,” he said. “But if I did this for all the different kinds of equipment that a carrier has in central offices today, I find there’s a big opportunity to regularize and homogenize the infrastructure, and make it all common across a large swath of different kinds of legacy boxes.”
From Many Servers to One Box
In a perfect world, engineers might prefer to build one-to-one virtual equivalents for every current-day network function and appliance, and work out the homogeneity of those devices over time. But carriers may have already run out of time.
“What we see in all of this, in terms of consumer and enterprise traffic that lies ahead — especially with Internet of Things — is [it’s] going to put even heavier demands on this infrastructure,” said Manish Singh, VP for SDN/NFV product management at India-based IT consultancy Tech Mahindra.
“One of the things that we look at is the economies of homogeneity,” he remarked, “which we have seen in the Web-scale properties — the Facebooks and Amazons – the way they have gone about implementing their infrastructure, truly leveraging merchant silicon... and instead of creating vertically integrated appliances really creating an infrastructure where the applications are truly decoupled.”
Major cloud service, data center, and network connectivity providers like Equinix, major network equipment providers like Huawei and Cisco, and major telcos like AT&T are all willing to invest more in development of open source networking standards. They’re willing to sacrifice any competitive advantage they may receive from ownership of the final solution if it means being able to shed their old, costly product lines, service offerings, and infrastructure years sooner.
Perhaps the very symbol of homogeneity in data center hardware has been the Open Compute Project, the Facebook-led effort to pool together the buying power of huge data centers looking to drive down the costs of commodity hardware even further. The biggest subscribers to OCP are the heaviest consumers of open source software, including cloud service providers and telcos — data centers that, frankly, have grown beyond Web-scale.
As Facebook’s engineering director and OCP co-leader Omar Baldonado told ONS attendees, his team realized that the only way it would achieve efficiencies and operational advantages for both cost and production, for an infrastructure of Facebook’s scale, it had to start producing some of that infrastructure for itself.
“We weren’t seeing the solutions out there that we needed to work at this hyper scale,” said Baldonado. “So we took a whole-stack view of the data center and looked at all the different hardware pieces that we needed to address to make this happen, as well as all the different software pieces.”
That deep investigation of Facebook’s long-term infrastructure requirements led his team to realize it needed to systematically decompose the interfaces and connections between hardware and software into discrete classes. It was a process that led Facebook to discover disaggregation.
“The point of disaggregating is to get laser focus on different components and making the best-of-breed the best,” Baldonado explained. “And if that means making the best power supply... or a new 48V rack, or Just a Bunch of Flash (JBOF), or some of our AI compute infrastructure for machine learning — get people focused on that, make those pieces just awesome. Don’t worry about all the pieces that need to run on top of them; make awesome building blocks.”
From Many Paths to... Many Other Paths?
At ONS, members of the ONOS project staged some of the first demonstrations of CORD architecture and ONOS technology at work. For residential customers, CORD’s objective is to replace what service providers call Customer Premise Equipment (CPE), those unwanted appendages cable TV customers are forced to attach to their sets, with virtual Service Gateways, or vSG. Virtual services under CORD can be orchestrated in the data center and streamed to devices through an interface that is either more like, or is, the Web. Consumers subscribe to functions such as television and phone service, and those functions are virtualized and orchestrated on either virtual or bare metal servers.
For now, at least, orchestration involves Docker containers. As of last month, ONOS reported, the best scale-out it was able to achieve was around 1,000 containers per server on bare metal, and 2,000 containers on a platform where containers were run inside VMs. Each container runs what CORD calls a subscriber bundle, so the subscriber function is virtualized by a Docker container.
Clearly, ONOS may not be using the most optimum scaling strategy for virtual platforms, as enterprise data centers will attest with scaling with VMware vSphere. The word “microservices” did come up a few times in discussion at ONS, though it was obvious not everyone was certain what it meant. So a platform that scales more like Netflix’ microservices-based model may yet make ONOS and CORD more efficient.
The ONOS demo does run on OpenStack, but it only makes use of its Nova compute engine for general purposes, we were told; the Neutron networking engine (which supports GRE and VXLAN overlay networks) is only utilized for the metadata store. Because ONOS must scale to a much wider extent than an ordinary enterprise workload — and also, some say, because Neutron is just too complicated — ONOS must manage its own networking and cannot rely merely upon OpenStack.
ONOS Project’s subscriber functions as a network service graph (which is simplified like a subway map) and a live logical resource map. As the ONOS team explained, acting in association with CDN provider Akamai, ONOS developed a way to move both subscriber functions and content as close to the edge of the network as possible, in order to expedite handling. Imagine a world where all of the customer service functions of a major multi-channel video provider are handled from the office and don’t require local service calls.
What does this mean for the data center? A sizable, though not exorbitant, amount of homogenous hardware — most likely a derivative of OCP — would absorb the functions and services currently being provided by CPE boxes. Divide the number of customers served by about 1,000, then by about 40 for the number of 1U servers you can fit in a tall rack, and a theoretical video service provider handling some 22 million customers nationwide should prepare to host some 550 racks throughout the continent. That’s within reason.
The scale required by a telecommunications service provider such as AT&T, however, has yet to be fully fathomed. At ONS, the carrier gave attendees first look at ECOMP, its architecture for the orchestration and management of its next-generation data center. The company is looking for what it describes as guidance and input from the open source community. Without setting a firm deadline, chief strategy officer John Donovan did intimate on more than one occasion that one year’s time would be too long.
No Open SDN Platform Consensus on Horizon
Why only a matter of months? AT&T and other service providers worldwide are looking for greater consensus from open source as to how they want to proceed with their own projects (Huawei called it a “common intent framework”). While ONOS is one prominent SDN platform with which AT&T participates, OpenDaylight is yet another SDN platform; and despite rumors to the contrary, ONS made it clear to attendees that both platforms seem content to remain in blissful, overlapping co-existence with one another.
That wasn’t the message that many carriers and services providers wanted to take with them from last week’s conference. As AT&T and others stated explicitly, the whole purpose of moving toward an open source platform in the first place was to embrace a consensus platform rather than support a multitude of options, some of which are fragmented, others fractured, and others kept on life support. AT&T’s goal is to move from nearly 6 percent to about 30 percent total network virtualization by the end of this year, and from 74 OpenStack-based integrated cloud zones to 105 by the same deadline.
“We think service providers have to go to a white-box model if they want to survive,” announced Donovan at ONS. “It’s not just about pulling parts off a shelf, slapping them together, and making your own servers. We’re collaborating with the component manufacturers on the roadmaps for designing and building the silicon, so we can get our customers the capabilities they need as fast as possible.
“We could debate for a year how to do this,” he said at one point. “And we’re just not in a position today to debate. I can’t tell you how much we operate as if our hair is on fire.”
For now, some industry leaders get the feeling they’re asking more questions than they’re receiving answers.
“The reality is that we are almost stuck at ideation,” said Tech Mahindra’s Singh. “How do we go from ideation to implementation? We rarely see large-scale deployments yet. So the challenge for the industry is, how do we cross the chasm?”
If open source wants to continue its claim of transforming data center and carrier networks by way of disaggregation, it may need to forge a single hammer out of the bundle of sticks it’s currently wielding.