Google, IBM, Others Pitch Open Standard for Cloud Server Design

A group of tech giants working to mount a serious challenge to Intel in the data center, has previewed an upcoming open standard for interconnecting components in a server it is positioning as an alternative to Intel’s proprietary technology.

The group includes Google, hardware vendors IBM, HP Enterprise, Dell EMC, as well as Intel’s more direct rivals AMD and NVIDIA, among others. IBM’s upcoming Power9 processors, expected to launch next year, will support the standard and so will IBM’s servers they will power.

Intel currently dominates the market for server chips, and hyperscale data center operators like Google, which spend enormous amounts of money on hardware every quarter, want a viable alternative. They have generally adopted a multi-vendor strategy for sourcing nearly all components of their infrastructure, but it’s difficult to extend that strategy to processors given the size of Intel’s lead in the market.

OpenCAPI and Power9 are aimed at the high end of the server market – computers used for data-intensive analytics workloads or machine learning. The group claims that the standard will be capable of boosting server performance tenfold.

That performance improvement comes as a result of two things: higher bandwidth on the links between CPUs and accelerators and cache coherency, which essentially means data needs to be shuffled less within the system as it is being processed, saving resources as a result.

Click chart to enlarge:

Accelerators, or additional processors that take on a portion of the CPU’s workload to free up its resources, have been a mainstay in the world of supercomputers for years, but their role is now growing in importance in server architecture for cloud data centers and for the quickly emerging field of machine learning. “The compute model going forward is the marriage between a really good data-centric processor, like Power, and a really good set of acceleration technologies,” Doug Balog, general manager for IBM Power Systems, said in an interview with Data Center Knowledge.

Most accelerators in use today are GPUs, made by the likes of AMD and Nvidia, and some are Intel’s Xeon Phi, but there has also been growth in the use of FPGAs, or Field-Programmable Gate Arrays, as accelerators. The advantage of FPGAs is that they can be reconfigured as workload needs change.

Intel has invested heavily in FPGAs last year, paying $16.7 billion to acquire FPGA specialist Altera. The most prominent user of FPGAs to accelerate cloud workloads is Microsoft, whose latest-generation cloud server design supports the technology.

It’s unclear at this point what kind of architecture will dominate the market for machine-learning hardware. There are divergent views on this today, with companies like Nvidia supporting GPU-accelerated AI servers and Intel saying that model isn’t scalable, pitching the next generation of its Xeon Phi processors – codenamed Knights Mill and expected to hit the market next year – as the better alternative.

Amazon's cloud servers for data-intensive workloads, including machine learning, rely on GPUs, and so does Big Sur, Facebook's open source server design for AI workloads.

Google has designed its own custom chip for machine learning, called Tensor Processing Unit. The company hasn’t revealed any details about TPU’s architecture, saying only that it is an ASIC (Application Specific Integrated Circuit) and that it is optimized for TensorFlow, its library of open source software for making AI applications.

Google is also working on a server design together with Rackspace, which will run on IBM’s Power9 processors and have the OpenCAPI interface. The companies released the first draft of the Zaius server spec, which they plan to contribute to the Open Compute Project, today.

The OpenCAPI consortium has an FPGA player among its members, in addition to server and GPU vendors. San Jose-based Xilinx plans to support OpenCAPI-enabled FPGAs, according to Friday’s announcement.

IBM’s accelerator strategy has been to support as broad an array of choices as possible. Its current-generation Power8 chip supports NVIDIA’s GPU interconnect technology NVLink, and so will Power9, Balog said.

Comments

Plain text