IBM Unveils AI Toolkit for ‘Minsky,’ Its Nvidia GPU Hard-wired Server

While IBM moves forward with efforts to render its high-speed OpenCAPI data bus an international standard, making way for new players and new designs for FPGA accelerators to enter the space, on Monday the company also placed a substantial bet on the class of accelerator made famous by the PCI bus: general-purpose GPUs. IBM officially commenced a collaboration with GPU maker Nvidia to build a software library called PowerAI, designed specifically to leverage its new line of hard-wired, GPU-accelerated servers.

One such server, whose existence was first disclosed last September, is a custom-built Power8 CPU-based server — built on its existing 822LC, and dubbed “Minsky” — whose CPU and GPU are hard-wired using Nvidia’s proprietary NVLink interconnect bus.

“We actually created a new chip that we call the Power8 NVLink processor,” stated Sumit Gupta, IBM’s vice president for high-performance computing and analytics, in an interview with Data Center Knowledge.

“This has a high-speed, NVLink interface embedded in it. This is a proprietary, private interface, only on our CPU and Nvidia’s new Pascal GPU. This has enabled us to build a server that has much faster communication between the CPU and GPU. . . This server, because of this interface between the processors, gives us a very big performance advantage.”

IBM’s Power8-based S822LC is a 2U, 2P unit built with four CAPI-enabled PCIe expansion slots. Originally, through its existing partnership with Nvidia, the S822 chassis was optimized to support Tesla-model K80 dual-GPU accelerators. But Minsky is designed to support Nvidia’s newer Pascal P100 GPUs instead, by way of this proprietary NVLink interconnect.

While We’re At It, Let’s Uproot Hadoop

Gupta told us he believes Minsky’s advantages will be realized in the field of distributed applications — not just by accelerating the workload on one server, but through a mass acceleration of Minsky servers belonging to combined clusters.

He told us the story of an (unnamed) mid-size IBM customer in the business of producing consumer events. The company invested in GPU-accelerated servers, but soon found itself encountering bottlenecks with handling its unstructured data. Gupta asked what file system was being used, and the response was: HDFS, the cross-volume file system for Hadoop.

“I said, ‘HDFS was never built for this kind of throughput!’” Gupta continued. “We got into a discussion about IBM’s parallel file system, which we invented for the HPC space — GPFS Spectrum Scale. That company realized it needed to use some of these parallel file systems. It’s these secondary issues that people aren’t thinking about, when they start on this journey.”

It’s a clever argument in favor of a full-stack approach to highly parallel, highly distributed workload development, tightening the bonds between the underlying algorithm libraries, the GPUs, and the CPUs. So it’s no surprise that Minsky’s unveiling comes in conjunction with IBM’s release of a deep learning AI toolkit, called PowerAI, leveraging GPUs linked to IBM Power CPUs by way of NVLink.

“Every retailer, bank, or consumer-facing customer we talk to, and even logistics companies with customer-facing Web sites, are looking at how they can use chatbots. How can they automate their call centers to improve the quality of service?” IBM’s Gupta asked. “People are looking at how they can take advantage of all the data that’s coming in from social media, from customers browsing websites, from customer purchasing histories. These new methods that use machine learning and deep learning are becoming extremely effective in enabling customers to take advantage of these use cases.”

Parallelism and Profiling

IBM says it tested S822LC Minsky servers with 2P 8-core Power CPUs and 4 Nvidia Pascal P100 GPUs, running Toronto University research Alex Krizhevsky’s permutation of image recognition neural network library ImageNet, called “AlexNet,” by way of the popular Caffe framework, using IBM’s newly released PowerAI library. It’s promising double the performance over the same workload running on 2P 10-core Power S822L servers equipped with 4 Nvidia Tesla M40 GPU accelerators.

[SCM]actwin,0,0,0,0;https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf Mozilla Firefox firefox 11/14/2016 , 5:32:04 PM

Meanwhile, for its Pascal P100 GPUs, Nvidia has been promising up to 48 times the performance on some benchmarks versus a standardized equipped 2P Intel 4th-generation Haswell-equipped processor.

The reason, Nvidia says, involves the use of high-bandwidth HBM2 memory, which shares the die with the GPU and eliminates the need for GDDR5 video memory on a separate bus.

Of course, those benchmarks are using artificial workloads, which in the artificial intelligence space may be too artificial. There’s an argument to be made that an operating profile for real-world work in the emerging AI space has yet to be determined — for example, picking out the profile of a prospective customer based on. . . shall we say, very delicately, looks.

“Nobody expects this to be easy. I don’t think I’ve spoken to anybody at these [HPC] forums who doesn’t realize this is not an easy task,” admitted IBM’s Gupta. “In fact, you know it’s not easy, because honestly, there’s no website that you can do this on today. Not even the born-on-the-Web companies have been able to solve the problem of retail image analysis yet. So I think people realize these are tough challenges.”

But one client told Gupta it had finally evolved to that critical mass-state where it can efficiently and affordably collect, save, and store all the data it’s collecting. Data lakes now have some respectable, purposeful identity. But now they need a similarly affordable way to assess what it is that they have.

“The data scientist guys don’t want to worry about the infrastructure, making sure the damned software compiles and it scales to a cluster,” he said. “They want that part of it taken care of, so that they can focus on data science.”

Gupta is hopeful that once PowerAI establishes a baseline level of support for AI functionality, developers of all sizes will be able to build on that baseline, as though it were a platform in itself. Some independent developer somewhere will be able, he feels, to establish himself in the AI space with an implementation of a deep learning library for retail that builds on the accelerated components of PowerAI.

IBM’s new framework is now available for download from its deep learning frameworks portal.

Comments

Plain text