Google to Sell New AI `Supercomputer' Chip Via Cloud Business

Move may test Nvidia’s grip on high-end semiconductor market and help Google catch cloud leaders Amazon, Microsoft


May 17, 2017

6 Min Read
Google to Sell New AI `Supercomputer' Chip Via Cloud Business
Google says its new Cloud TPU V2 delivers up to 180 teraflops to train and run machine learning models (Photo: Google)

Mark Bergen (Bloomberg) -- At the I/O developer conference last year, Google debuted its first chip. The company kept the component mostly for internal artificial intelligence needs. Today, version two arrived -- and Google is selling this one.

Chief Executive Officer Sundar Pichai announced the new chip on Wednesday during a keynote address at the Alphabet Inc. unit’s annual I/O event. Normally, the gathering focuses on mobile software. This year’s spotlight on hardware underscores Pichai’s effort to transform the search giant into an "AI-first" company and a real cloud-computing contender.

Companies will be able to purchase the hardware, called Cloud Tensor Processing Units (TPUs), through a Google Cloud service. Google hopes it will quicken the pace of AI advancements. And despite official statements to the contrary, it may also threaten Intel Corp. and Nvidia Corp., the main suppliers of powerful semiconductors that run large processing tasks.

See also: NVIDIA CEO: AI Workloads Will “Flood” Data Centers

"This is basically a supercomputer for machine learning," Urs Hölzle, Google’s veteran technical chief, said. Machine learning, a method for deciphering patterns in reams of data, is behind Google’s recent progress on voice recognition, text translation and search rankings. But the approach cost a lot, and sucked up computing time in Google’s data centers. The latest chip was designed to address these issues, and executives said they saw dramatic improvements after putting the component to work on these internal tasks.


A “TPU pod” built with 64 second-generation TPUs delivers up to 11.5 petaflops of machine learning acceleration (Photo: Google)

Google wouldn’t divulge the chip’s price, what company manufactures it, or when the related cloud service goes on sale. Google still purchases processors from Intel and Nvidia. But by relying more on in-house designs, Google could trim its multi-billion-dollar annual computing bill.

Google plans more chips like this, and sees the components as essential for success in the cloud -- a key part of Alphabet’s push to make money beyond digital advertising.

"The field is rapidly evolving," Hölzle said. "For us, it’s very important to advance machine learning for our own purposes and to be the best cloud."

See also: This Data Center is Designed for Deep Learning

Google’s cloud business grew by more than 80 percent last year, according to estimates from Synergy Research Group. But Amazon Web Services still has over 40 percent of the public cloud market, and continues to expand at a steady clip. Google is third, according to industry estimates.

To gain share, Google is leaning on its AI prowess. The Cloud TPU chip won’t be sold to Dell Inc. and other makers of servers that power traditional corporate data centers. To get the benefits, customers will have to sign up for a Google cloud service and run their software tasks and store their data on Google’s equipment. If companies get on board, Google insists, they can plumb their own data for unseen efficiency gains and profit.

AWS and no. 2 player Microsoft Corp., make similar cases. So Google’s pitch stresses performance. A single Cloud TPU device, composed of four chips, is nearly 12,000 times faster than IBM’S Deep Blue supercomputer, the famous chess victor from 1997, Hölzle said. Google is stringing 64 of the devices into "pods" that sit in its data centers.

See also: Deep Learning Driving Up Data Center Power Density

Google unveiled its chip at last year’s I/O conference, so why does it need another? First, the company is going up against rivals that develop and deliver faster processors on an annual cadence. To lock in customers, it must match that pace.

In addition, the original chip only worked for "inference," processing data that’s already packaged in mathematical models. It’s akin to compressing large photos into tiny digital formats. For instance, a company could turn an algorithm for voice recognition into an app using inference chips.

To create an algorithm from just raw voices, you need lots of data to train AI software. That takes massive computing power, forcing coders to wait days or weeks to see results. Google’s second chip speeds up the training process. In internal tests, it cut the time in half compared to commercially available graphic processing units, known as GPUs.

Nvidia, the dominant GPU manufacturer, recently announced a new chip, called Volta, that handles training data like Google’s Cloud TPU. An eight-chip Volta module will sell for $149,000 starting in the third quarter.

Google is less experienced at selling chips, so it’s being cautious about commercial deployment. "When you have something that’s really new, some of the tools occasionally break. You want to reach a certain level of maturity," Hölzle said. "We’re probably going to have a lot more demand than we can satisfy."

Excessive demand inspired the creation of TPUs in the first place, according to company lore. Six years ago, Google saw an uptick in voice searches on phones. Just three minutes of conversation a day, per Android phone user, would have doubled the number of data centers Google needed, based on its technology at the time. TPUs were designed to handle the extra volume more efficiently.

The second-generation chip accelerated Google’s own research. For its translation efforts, Google previously ignored more than eighty percent of its data at the training stage, according to Jeff Dean, who leads a Google AI research unit called Brain. With its new chip, they can use all the information. That means better trained and potentially more accurate AI software.

The new chip may let researchers use image data that currently sits unused because of high computing costs, according to  Fei-Fei Li, an AI expert who runs a machine learning group inside Google’s Cloud business division. Image classification is one of the machine learning tools Fi’s team is offering cloud clients, and the new chip will make this more accessible and usable.

EBay Inc. used Google’s cloud to develop ShopBot software that identifies items snapped on smartphone cameras. Today’s image-recognition systems have around ten percent accuracy, said R.J. Pittman, EBay’s Chief Product Officer. The new Cloud TPU, which EBay has tested, could eventually increase accuracy to more than 90 percent, he added.

Companies like EBay want AI to tag every physical good in existence. Li imagines businesses that may want to map every square inch on earth or each minute part of a human cell.

Amazon and Microsoft have their own AI-powered cloud services too though, and both have committed to buy Nvidia’s Volta chips. Nvidia’s data center sales surged 186 percent during the first quarter. "Nvidia is not standing still," said Pittman from EBay, which also buys Nvidia GPUs.

Hölzle dismissed a direct rivalry. Nvidia’s chips are built for more general-purpose tasks, he said, while Google’s focus solely on machine learning.

That won’t calm Intel and Nvidia investors, who worry about in-house chipmaking efforts by their largest customers -- data center operators like Google. Analysts are concerned that revenue and profitability at the two companies, both at historically high levels, may be dented. Even if Google doesn’t succeed in commercializing its own chips, it’s in a better position to negotiate on price.

Google isn’t restricting cloud customers to its own chips. It has Intel and Nvidia processors running inside its data centers. Google’s pushing a Lego-like model -- corporate customers can chose their combination of software and hardware, and rent storage and computing power by the minute. It has to be flexible if it’s going to catch AWS and Microsoft.

"Down the road, we make actually pick the hardware for you that minimizes your cost or maximizes your turnaround time or whatever you tell us is important to you," Hölzle said. "It becomes invisible to you."

Read more about:

Google Alphabet
Subscribe to the Data Center Knowledge Newsletter
Get analysis and expert insight on the latest in data center business and technology delivered to your inbox daily.

You May Also Like