Google Cloud and Nvidia said Tuesday that Google would be the first cloud provider to offer Nvidia’s latest GPU for machine learning as a service.
The Nvidia A100 Tensor Core GPU, based on the chipmaker’s new Ampere architecture, represents the largest intergenerational leap in performance in Nvidia’s history. The company said the part performed 20 times better than its previous-gen product.
Another way Ampere is different from its predecessors is that it’s designed for both training and inference machine learning workloads. Nvidia designed a different GPU for each of the two types of workload in prior generations.
And clients can now kick the tires on Ampere in Google Cloud, as part of a new type of cloud instance the provider also announced Tuesday: Accelerator-Optimized VM, or A2.
The beefiest configuration of Google's A2 cloud instance comes with 16 Ampere GPUs, all interconnected by NVSwitch, Nvidia’s technology for interconnecting many GPUs to form a single computing fabric. That’s 640GB of GPU memory, 1.3TB of system memory, and 9.6TB/s of aggregate bandwidth.
Smaller A2 configurations are available as well.
For now, A100 instances are only available in alpha. Google Cloud was also first to launch Nvidia’s older T4 GPUs, in November 2018, also in alpha. T4 beta came about three months later, and general availability was announced after four more months.
Google Cloud may be first to offer Ampere GPUs as a service, but Nvidia had delivered the chips to all the major cloud providers as of mid-May, when it announced the chip publicly, the chipmaker’s CEO, Jensen Huang, told Bloomberg at the time. The others (AWS, Azure, Alibaba, Oracle, IBM) will likely roll out their own Ampere cloud infrastructure soon.
Google Cloud was also first to roll out cloud instances powered by AMD’s Epyc 2 chips, which beat comparable Intel parts on both performance and price. The instances, according to Google, would be the most powerful VMs available to Google Cloud users.
Epyc 2 is also the CPU in Nvidia’s own Ampere-based supercomputer for machine learning, the DGX A100, which it announced along with A100 GPUs in May.