Today IBM announced availability of the beta version of its Distributed Deep Learning software it says has demonstrated “a leap forward in deep learning performance.”
Deep learning is a form of AI that relies on the application of “artificial neural networks” inspired by the biological neural networks of human and animal brains. Its focus is on giving computers the ability to “understand” the contents of digital images, videos, audio recordings and the like in much the same way that people do.
Much of the potential for deep learning remains unfulfilled, however, because the logistics of processing the great amount of data required for a system’s “deep level training” makes it a slow process that can take days or even weeks. Accuracy of the results is another issue contributing to the time factor, as the system needs to be taught multiple times in order to gain the desired results. A higher accuracy on each pass means fewer times the computer must be “retrained” until it gets it right.
Reducing the time factor has been difficult because merely adding more compute power with faster processors and more of them doesn’t speed things up. Actually, just the opposite; as the number of “learner” processors increases, the computation time decreases, as expected, but the amount of communication time per learner stays constant.
In other words, bottlenecks get in the way.
“Successful distributed deep learning requires an infrastructure in which the hardware and software are co-optimized to balance the computational requirements with the communication demand and interconnect bandwidth,” IBM explained in a research paper. “In addition, the communication latency plays an important role in massive scaling of GPUs (over 100). If these factors are not kept under control, distributed deep learning can quickly reach the point of diminishing return.”
This has kept most deep learning projects limited to single-server implementations. It’s also where the research and new software IBM unveiled today come into play. The company has learned how to speed up the process with more accurate results.
“Most popular deep learning frameworks scale to multiple GPUs in a server, but not to multiple servers with GPUs,” Hillery Hunter, director of systems acceleration and memory at IBM Research, wrote in a blog post. “Specifically, our team wrote software and algorithms that automate and optimize the parallelization of this very large and complex computing task across hundreds of GPU accelerators attached to dozens of servers.”
In tests of the software, IBM researchers achieved record communication overhead and 95 percent scaling efficiency when deploying the Caffe deep learning framework with a cluster of 64 IBM Power systems with 4 Nvidia Tesla P100-SXM2 GPUs connected to each — for a total of 256 processors. This bested the previous scaling efficiency record of 89 percent demonstrated by Facebook AI Research using smaller learning models and data sets, which reduced complexity, according to IBM.
In addition, the tests produced a record image recognition accuracy of 33.8 percent for a neural network trained on a data set of 7.5 million images, besting the previous accuracy record of 29.8 percent posted by Microsoft.
“My team in IBM Research has been focused on reducing these training times for large models with large data sets,” Hunter wrote. “Our objective is to reduce the wait-time associated with deep learning training from days or hours to minutes or seconds, and enable improved accuracy of these AI models. To achieve this, we are tackling grand-challenge scale issues in distributing deep learning across large numbers of servers and GPUs.”
Hunter and her team have certainly made a big start in speeding up the process — completing the test in only seven hours.
“Microsoft took 10 days to train the same model,” she said, referring to the previous industry record. “This achievement required we create the distributed deep learning code and algorithms to overcome issues inherent to scaling these otherwise powerful deep learning frameworks.”
A beta version, or technical preview, of the code Big Blue developed around the test — IBM Research Distributed Deep Learning software — became available today in IBM PowerAI 4.0, making the cluster scaling feature available to developers using deep learning for training AI models.
“We expect that by making this DDL feature available to the AI community, we will see many more higher accuracy runs as others leverage the power of clusters for AI model training,” Hunter said.