Intel Sets FPGA Goal: Two Orders of Magnitude Faster than GPGPU by 2020

Intel used to be capable of delivering process miniaturization and configuration improvements in a predictable, “tick-tock” cadence. That was before the laws of physics stomped on the company’s best laid plans for miniaturization. So in a special event Thursday in San Francisco, Intel formally unveiled the hardware it expects will take over from CPUs, in the job of scaling the next big obstacle in performance improvement, including a PCI Express-based FPGA accelerator exclusively for deep learning applications, and a customized derivative of its “Knights Landing”-edition Xeon Phi processors.

“I know that today AI has definitely become overhyped,” admitted Barry Davis, Intel’s general manager for its Accelerated Workload Group, in a press briefing prior to the event. “But the reality is that it’s a rapidly growing workload — whether we’re talking about the enterprise, the cloud, everyone is figuring out how to take advantage of AI.”

At least, that’s Intel’s best hope, as the company shoves all its chips into the center of the table, betting on hardware-driven acceleration to carry on the company’s founding tradition.

What’s being called the Deep Learning Inference Accelerator (DLIA), puts to work the Arria 10 FPGA design that Intel acquired last year in its purchase of Altera. DLIA is available for select customer testing today, subject to Intel approval, though the accelerator card will be generally available in mid-2017.

In a statement released Thursday, Intel Executive Vice President and Data Center Group General Manager Diane Bryant boasted, “Before the end of the decade, Intel will deliver a 100-fold increase in performance that will turbocharge the pace of innovation in the emerging deep learning space.”

That statement came with the requisite footnote under “performance,” reminding readers that such displays of performance will be registered on the company’s benchmarks of choice (including SYSmark), and not necessarily in everyday practice.

As originally conceived, Arria 10 would be put to work in reconfigurable communications equipment such as wireless routers, transceiver towers, and live HDTV video camera gear. But Intel is leveraging it now as a caretaker for a convolutional neural network (CNN, albeit without James Earl Jones).

Think of a CNN as a way that an algorithm can “squint” at an image, reducing its resolution selectively, and determining whether the result faintly resembles something it’s been trained to see beforehand.

As Facebook research scientist Yangqing Jia, Caffe’s principal contributor, writes, “While the conventional definition of convolution in computer vision is usually just a single channel image convolved with a single filter (which is, actually, what Intel IPP’s [Integrated Performance Primitives] convolution means), in deep networks we often perform convolution with multiple input channels (the word is usually interchangeable with ‘depth’) and multiple output channels.”

If you imagine a digital image as a product of several projections “multiplied” together, you see what Jia is getting at: an emerging picture of something an algorithm is training a system to recognize.

Intel intends to position DLIA towards customers interested in tackling the big, emerging AI jobs: image recognition and fraud detection.

“FPGAs are very useful for artificial intelligence,” Intel’s Davis told reporters. “They’ve been used quite extensively for inference or scoring, to augment the Xeon CPU. Today, 97 percent of scoring or inference is actually run on Intel Xeon processors. But sometimes people do need a bit more.”

Those people, specifically, are customers who are well aware of the purposes they have in mind, and are familiar with the algorithms they intend to use for those purposes, said Davis. It was a surprising statement, given that the DLIA project has been described as geared toward an entry-level product for a broader range of AI users.

Intel has already produced its own fork of the Caffe deep learning framework, and Davis said DLIA will be geared to accelerate that library.

To be released at or about the same time as DLIA is a derivative of the “Knights Landing” generation of Intel’s Xeon Phi processors — its original effort to supplement CPU power. Back in August, at the Intel Developer Forum, Diane Bryant spilled the beans on “Knights Mill,” a variant on the existing generation that’s optimized for AI workloads. Specifically, the variant will focus on improving performance for mixed-precision operations, which may be critical when pairing high-precision AI algorithms for precision against low-precision algorithms for speed.

At AI Day on Thursday, Intel promised a 4x performance boost for Knights Mill versus the previous generation of Xeon Phi (which would be “Knights Corner”), for deep learning workloads. The company also plans to imbue the next generation of its mainline Xeon CPUs with its Advanced Vector Instruction set, potentially boosting the product line’s performance with floating-point operations run in parallel.ou

Comments

Plain text