Last October, a computer system beat a professional human player at the ancient Chinese board game Go. The AI system, AlphaGo, was built by Google and trained using machine learning techniques.
Google built the hardware that powered AlphaGo in-house, as it does with most of its infrastructure components. At the core of that hardware is the Tensor Processing Unit, or TPU, a chip Google designed specifically for its AI hardware, the company’s CEO, Sundar Pichai, said from stage this morning during the opening Google I/O conference keynote next to Google headquarters in Mountain View, California.
This is the first time Google has shared any information about the hardware backend that powers its AI, which will play a central role in the company’s revamped cloud services strategy, announced earlier this year. TPUs are part of the infrastructure that supports its cloud services.
The company has been running them in its data centers for about one year, Norm Jouppi, distinguished hardware engineer at Google, wrote in a blog post. The chips run as accelerators, providing "roughly equivalent to fast-forwarding technology about seven years into the future (three generations of Moore’s Law)," he wrote.
Related: Google to Build and Lease Data Centers in Big Cloud Expansion
Pichai shared little detail about the TPU, saying only that its performance per watt was “orders of magnitude higher” than any commercially available CPU or GPU (Graphics Processing Unit):
Google CEO Sundar Pichai on stage at Google I/O 2016 (Source: Google I/O live stream)
“Tensor Processing Unit (TPU) is a custom ASIC for machine learning that fits in the same footprint of a hard drive, and was the secret sauce for AlphaGo in Korea,” Google said in an emailed statement.
TPU gets its name from TensorFlow, the software library for machine intelligence that powers Google Search and other services, such as speech recognition, Gmail, and Photos. The company open sourced TensorFlow in November of last year.
The chip is tailored for machine learning. It is better at tolerating "reduced computational precision," which enables it to use fewer processors per operation. "Because of this, we can squeeze more operations per second into the silicon, use more sophisticated and powerful machine learning models and apply these models more quickly, so users get more intelligent results more rapidly," Jouppi wrote.