Google today revealed the latest in the series of technologies it has developed to improve performance of its global data center infrastructure. Espresso is a Software-Defined Networking stack that speeds up Google’s services for its end users as the company’s network hands traffic off to third-party internet service providers who take it the last mile.
While the company has customized nearly all components of its network, from data centers and servers to networking switches and software that orchestrates all this infrastructure, it does not control the infrastructure of ISPs that actually deliver its services to their consumers.
For that, Google (like all other internet companies) has to peer with ISPs in data centers in various metros around the world. Interconnecting with ISPs in 70 metros and generating 25 percent of all internet traffic, the company says its network’s peering surface is one of the largest in the world. Those peering points represent the edge of Google’s network.
“We found that existing internet protocols cannot use all of the connectivity options offered by our ISP partners, and therefore aren’t able to deliver the best availability and user experience to our end users,” Amin Vahdat, Google Fellow, and Bikash Koley, a distinguished engineer at the company, wrote in a blog post. Vahdat revealed the new architecture at the Open Networking Summit in Santa Clara, California, Tuesday.
Espresso is the fourth pillar of Google’s SDN strategy, the other ones being the company’s data center interconnect called Jupiter, its software-defined WAN called B4, and Andromeda, its network virtualization stack. Of the four, Espresso was the most challenging one, the engineers wrote.
It improves user experience in two ways. The first is automatic selection of the best data center location to serve a particular user from, based on real-time performance measurements.
“Rather than pick a static point to connect users simply based on their IP address (or worse, the IP address of their DNS resolver), we dynamically choose the best point and rebalance our traffic based on actual performance data. Similarly, we are able to react in real-time to failures and congestion both within our network and in the public internet,” Vahdat and Koley wrote.
The second is separation of logic and control of traffic from individual hardware routers. A single distributed system aggregates network information and makes routing decisions rather than relying on thousands of individual routers to manage packet streams.
“We leverage our large-scale computing infrastructure and signals from the application itself to learn how individual flows are performing, as determined by the end user’s perception of quality.”
Chart courtesy of Google