While we mostly hear about Artificial Intelligence systems like IBM’s Watson, which won Jeopardy! six years ago; Google’s AlphaGo, which won a game of Go in a match with the ancient Chinese game’s human world champion last year; or Carnegie Mellon’s Libratus, which last month beat one of the world’s top poker players, many computer scientists around the world are working on AI systems that will never appear in the news.
Over the last five years or so, Machine Learning, a type of AI, has been a quickly rising tide that’s now starting to permeate nearly every corner of technology. From self-driving cars to online advertising, cybersecurity, and video surveillance, companies are training computers to do many of the things human workers have been doing but better, or at least cheaper.
Neural networks, computer systems that aim to simulate the way neurons are interconnected in the human brain, are trained to do these tasks the same way babies learn about the world – by observation, repetition, trial, and error, assisted instead of parents by computer scientists – although babies are still much, much better at it. A neural net learns to understand spoken language, for example, by listening to a lot of recorded speech, such as movie dialogue; it learns to identify objects by looking at tons of images. When it makes an error, that data is fed back into the net, which makes fewer and fewer errors with every cycle.
Training is the most resource-intensive computing workload in the machine learning development process. The explosion of deep learning software development (deep learning is the most widespread machine learning technique) is driving a growing need for specialized computing infrastructure, geared for the types of workloads required to train neural nets. These computers are similar to high-performance computing (HPC) systems scientists use and as such require lots of power and cooling capacity from the data centers that host them.
The Artificial Mind is Power-Hungry
Seeing a business opportunity in this trend, a Poway, California-based company called Cirrascale recently pivoted from being a high-performance hardware vendor and cloud service provider to being a specialist in designing and hosting compute infrastructure for deep learning. In addition to selling the high-octane hardware, the company uses its data center outside of San Diego to provide this infrastructure as a service, somewhat similar to the way Amazon Web Services provides its cloud servers but with a few key differences.
“These types of boxes are very powerful,” David Driggers, the company’s CEO and founder, said in an interview with Data Center Knowledge. Because they have a lot of computing muscle, they are extremely power-hungry. Unlike AWS, which provides virtual server instances, Cirrascale’s deep learning cloud is a bare-metal cloud service. You get a dedicated high-performance box (or several) to run whatever software you need on it.
Driggers said many of his customers doing machine learning development work are new to the world of high-performance computing. It’s not trivial to set up, manage, and cool an HPC cluster, and they are happy to offload that problem to someone who understands it.
Cirrascale’s data center is designed to provide power densities north of 30 kW per rack (power density in an ordinary enterprise data center is 3 to 5 kW per rack, rarely exceeding 10 kW). “That’s a lot of wattage,” Driggers said. “Doing that part of it is hard, and we’re not charging a huge premium for that.”
<-- A cabinet housing Cirrascale's bare-metal cloud platform in the company's data center outside of San Diego (Photo: Cirrascale)
To cool that kind of density, the data center uses a proprietary liquid cooling system developed by ScaleMatrix, which owns and operates the Cirrascale data center. Instead of cool air traveling from front to back of the IT equipment (as it does in most data centers), the system pushes air with extremely high velocity from bottom to top, exhausting warm air at the top of the server cabinet. Each cabinet is a closed environment and has its own water supply and air circulation system, which ensures neighboring cabinets don’t affect each other’s temperature.
After many years of building high-performance computing systems, Cirrascale – whose previous incarnation was Verari Systems, the HPC hardware and data center container vendor that went bust in 2009 – has felt at home in the deep learning space, which it entered two years ago. “We’ve been doing 30 kW for well over 10 years, so we’re comfortable with standing up high-performance computing,” Driggers said.
Linking the Virtual Neurons
HPC systems and systems used to train deep neural networks are built using fairly similar architectures. Driggers believes that as the latter matures and starts to scale, its architecture is going to look more and more like that of the former.
The workhorse in this architecture is the GPU, or, more accurately, a group of GPUs networked together, computing in parallel. A single Cirrascale server for deep learning packs up to eight Tesla GPUs by NVIDIA (currently the GPU leader in deep learning), working in concert with an Intel Xeon CPU. Its most powerful cloud system has eight dual-GPU accelerators, being in effect a 16-GPU server, which you can rent for about $7,500 per month.
Cirrascale's GX8 Series server with eight of NVIDIA's Tesla GPUs, a deep learning workhorse (Photo: Cirrascale)
Cirrascale’s single most important innovation, its technological crown jewel, is a special way of interconnecting GPUs in a single system. Called PCIe Switch Riser, it enables any GPU to talk directly to any other GPU on the motherboard at maximum bandwidth, helping both performance and scalability.
DGX-1, NVIDIA’s own supercomputer designed specifically for deep learning, is configured in a similar way, Driggers said. The chipmaker’s GPU interconnection technology is called NVLink. He conceded that if you need “absolute cutting edge,” you should go with the NVIDIA box. But, if you can tolerate 15 percent lower performance while paying half the price, Cirrascale has a comparable system with the same NVIDIA P100 GPUs, he said. It sells the DGX-1 as well.
A look inside NVIDIA's DGX-1 supercomputer, the "absolute cutting edge" in deep learning hardware (Photo: Yevgeniy Sverdlik)
Startup Solving for Common Sense
While a lot is written about deep learning today, few companies are actually using the technology in production. Hyperscale cloud operators like Google and Facebook are applying it in many of their user-facing features, but most of the companies working in the field are still in development stages, and that’s true for the majority of Cirrascale’s cloud customers, who are writing algorithms and learning to scale their deep learning applications to handle larger data sets.
Today, each of these customers is taking a handful of nodes, a small subset of what Driggers believes they will eventually need. As they grow and their applications mature, he anticipates the preferred infrastructure model will be hybrid, combining private and public cloud.
One customer already using a hybrid set-up is Twenty Billion Neurons, or twentybn. The Berlin-based startup with a research lab in Toronto was founded a years ago by a group of academics who believe that the dominant neural-net training technique for some of the most promising applications, such as self-driving cars, is flawed and already ripe for disruption.
Instead of using still images to train neural nets to identify objects, the dominant approach, twentybn uses video. “Our mission is to teach machines to perceive and understand the world,” Roland Memisevic, the company’s chief scientist and one of its co-founders, said in an interview. Memisevic is a professor at the influential Montreal Institute of Learning Algorithms and a former doctorate student of Geoffrey Hinton, a key figure in the development of deep learning as we know it today.
That the world is three-dimensional; that there’s gravity; that an object has permanent features and can get from point A to point B only by moving – concepts a human being has a firm grasp of by the time she reaches three – are things that are extremely difficult for a machine to understand by looking at still images, Memisevic explained, adding that there’s strong scientific reason to believe that the only way it can gain that understanding is through video.
Twentybn has paid an army of internet users to shoot more than 60,000 short video clips of themselves doing simple things like throwing objects against walls, dropping objects, or picking them up, videos “generated to reflect things that we want the network to learn,” he said. The company is using these and synthesized videos to train its neural networks with the goal of selling custom AI solutions for autonomous vehicles and video surveillance.
Twentybn uses Cirrascale’s GPU-packed bare-metal cloud servers to train its neural nets but also keeps its own computing cluster in-house, at its lab in Toronto, to handle the massive amount of synthesized video it generates.
A Post-GPU Future?
Memisevic believes technologies that improve communication between GPUs, like the cloud provider’s Switch Riser, are going to be indispensable in the future, as neural networks get bigger and bigger. However, it’s unclear at the moment what the best way to harness a lot of GPUs will be over time; today there are several approaches.
Because what we’re witnessing is just the beginning of what is expected to drive the next technological revolution, there is still a lot of unknowns about the kind of computing and data center infrastructure machine learning or other types of AI will ultimately require. “We looked around for every company like mine that has to find a way to harness GPUs to train networks, and we have been and are still exploring multiple directions toward using those GPUs,” Memisevic said.
Using hybrid cloud was a strategic decision twentybn made precisely because of the uncertainty of what its future computing needs may be. It’s putting two ponies in the race, and one of them is a rental. Even GPUs themselves may eventually be replaced by something that simulates neural nets better and more efficiently, he said. Today’s brute-force approach to making these networks more powerful by plugging in more GPUs is far from ideal.
In fact, he is convinced there will be a better alternative. After all, the human brain is a lot more powerful than a GPU cluster while using a tiny fraction of energy and occupying a tiny fraction of space than the cluster does. “Right now we’re scaling, scaling, scaling; and that’s going to grow,” he said. “Demand for high-power computation on GPUs is unfortunately going to grow over the years. GPUs, as compared to brains, use ridiculously large amounts of electricity; there could be something so much better that uses so much less power.”