Google unveils second-generation Tensor Processing Unit designed for AI

Google spills some details on its deep learning chips

Internet giant Google has spilled some details about its latest Tensor Processing Unit custom AI chips that, it claims, are capable of delivering 11.5 petaflops of processing power when linked together as part of a 64 TPU v2 "pod".

The company originally announced its new chip - the version 2 - at its I/O developer conference back in May. However, it has remained tight-lipped about its actual capabilities and specifications.

The TPU v2 is Google's second stab at a custom-designed AI chip. But at the Neural Information Processing Systems (NIPS) conference last week, Google senior engineer Jeff Dean conducted a presentation about the new chip.

Previously, Dean has talked about how the chip can use machine learning models to improve language translation and image recognition.

"We're excited to announce that our second-generation Tensor Processing Units (TPUs) are coming to Google Cloud to accelerate a wide range of machine learning workloads, including both training and inference," he said.

"We've witnessed extraordinary advances in machine learning over the past few years. Neural networks have dramatically improved the quality of Google Translate, played a key role in ranking Google Search results and made it more convenient to find the photos you want with Google Photos.

"Machine learning allowed DeepMind's AlphaGo program to defeat Lee Sedol, one of the world's top Go players, and also made it possible for software to generate natural-looking sketches."

According to ZDNet, the TPU v2s require their own custom high-speed interconnect, with each chip delivering up to 180 teraflops of calculations. Google claims that they can easily be combined to form supercomputers.

While there is a degree of customisation required, they can be used with the Google Compute Engine as "Cloud TPUs" programmable via TensorFlow.

In his presentation, Dean discussed the design of the the TPU pods and TPU v2 chips. The pods, he said, come with 64 TPU2s offering a total estimated compute power of 11.5 petaflops.

Every TPU v2 comprises four TPU chips, each with 16GB of high bandwidth memory (HBM), or 64GB in total. The TPU v2 is capable of addressing 2,400GB/s memory bandwidth, and each Tensor Processing Unit is designed to be connected with other Units to make more powerful computers.

Meanwhile, the actual TPU v2 chips come with two cores. "For large models, model parallelism is important. But getting good performance given multiple computing devices is non-trivial and non-obvious," he added.

Dean's full presentation slides can be downloaded here.