Intel Wants to Make Machine Learning Scalable

Intel’s strategy for tackling the AI CPU market, where it is facing competition from leading GPU makers and potentially also big customers that make their own specialized processors for this purpose, such as Google, rests to a great extent on designing systems that scale out rather than up. The latter, according to the chipmaker, is the conventional but inefficient approach to architecting these systems.

Software code in today’s machine learning systems (machine learning is one of the most active subfields in the development of artificial intelligence) is tough to scale and usually lives in a single box, Charles Wuischpard, VP of the Intel Data Center Group and general manager of the giant’s HPC Platform Group, said.

Companies generally buy high-power scale-up systems filled with GPUs. “In a way, there’s an efficiency loss here,” he said on a call with reporters last week.

This is the first time Intel has publicly discussed its strategy for the AI CPU market. “This is an early indicator of our plans in this area,” Wuischpard said.

The company has been working on a scale-out solution for machine learning, taking the cluster approach that’s typical in high-performance computing systems and hyperscale web or cloud applications.

In addition to taking a different architectural approach to machine learning, Intel’s AI CPU strategy includes a solution that does more than one thing. While some companies will have dedicated machine learning environments, Intel believes that the vast majority will want a single system that can run machine learning as well as other workloads.

“Enabling the most effective and efficient use of compute resource for multiple uses remains one of our underlying themes,” Wuischpard said.

Besides hardware, Intel has been investing in development of software tools and libraries for machine learning, training for partners, and an early access program for top research academics. About 100,000 developers are being trained on machine learning through its partner program, the company said.

Wuischpard talked about Intel’s plans in the AI CPU market in the context of a general-availability roll-out of Xeon Phi, its latest processor for high-performance computing. It is the chipmaker’s first bootable CPU and its first part to feature an integrated fabric (Intel Omni-Path) and integrated memory.

The company previewed the part in 2014 and has been shipping it in volume to select customers for several months, but it will become generally available in September, according to Wuischpard.

More than 100,000 units have been either sold or pending, he said, the bulk of them going to major supercomputer labs, such as Cineca in Italy, Texas Advanced Computing Center, and numerous US Department of Energy national labs, among others.

In the AI CPU market, Xeon Phi is Intel’s answer to GPUs by the likes of Nvidia. According to Wuischpard, Phi is faster and more scalable than GPUs.

Phi and GPUs are best suited for a subset of machine learning workloads called Training. Another type of machine learning workload, called Inference, is already dominated by Intel’s Xeon processors, he said, calling Xeon the most widely deployed processor for machine learning.

Earlier this year, Google announced that it has developed its own custom CPU for machine learning called Tensor Processing Unit, or TPU, potentially indicating that Intel and other processor makers were unable to produce a part that matched Google's performance and price requirements. Google and other hyperscale data center operators usually invest in engineering infrastructure components in-house when they cannot source them from the suppliers in the market.

Wuischpard said Google's TPU appears to be a highly specialized part designed for a specific workload, which makes it a small threat to Intel's general-purpose strategy in machine learning.

"This is a case where they found a way to develop a specialized use for something that they do at massive scale," he said. "I don’t think it [will turn] out to be as much of a general-purpose solution."

Updated with comments on Google's Tensor Processing Unit.

Comments

Plain text