Google's sixth-generation TPU Trillium now available in preview

Claims 67% efficiency increase

Google Cloud is bolstering its AI infrastructure with a suite of new hardware and software offerings.

At its App Dev & Infrastructure Summit, the company announced its sixth-generation Tensor Processing Unit (TPU), Trillium, is now available to Google Cloud customers in preview.

First announced in May 2024, Trillium is designed to accelerate the development and deployment of advanced AI models.

The new chip promises significant performance and efficiency improvements over its predecessor, the TPU v5e, the company says. For example, it offers a 4x boost in training performance and a 3x increase in inference throughput.

The chip's enhanced High Bandwidth Memory (HBM) and Interchip Interconnect (ICI) bandwidth make it suited for large language models (LLMs) like Google's own Gemma 2 and Meta's Llama, as well as computationally intensive tasks like diffusion models.

Google has engineered Trillium to scale easily, enabling the connection of up to 256 chips in a single pod. This scalability, coupled with Google's Jupiter datacentre network, allows for near-linear scaling of AI training tasks across thousands of chips.

"Trillium can scale up to 256 chips in a single high-bandwidth, low-latency pod using high-speed interchip interconnect. From there, you can scale to hundreds of pods, connecting tens of thousands of chips in a building-scale supercomputer, interconnected by our 13 Petabit per second Jupiter datacentre network," the company says.

"With Multislice software, Trillium enables near linear scaling of performance for training workloads across hundreds of pods. As the most flop-dense TPU to date, Trillium packs 91 exaflops of unprecedented scale in a single TPU cluster. This is four times more than the largest cluster we built with TPU v5p."

Google also claims a 67% increase in energy efficiency compared to previous generations.

While Trillium is a significant step forward for Google's AI hardware, the company is also embracing Nvidia's latest GPU technology.

Google Cloud is introducing A3 Ultra VMs, powered by NVIDIA H200 Tensor Core GPUs. These VMs offer significant performance and memory capacity improvements, making them ideal for demanding AI and HPC workloads.

"We've been using A3 Mega GPUs powered by NVIDIA H100 Tensor Core GPUs on Google Cloud to power our ML services across multiple regions. We are now excited to try out A3 Ultra VMs powered by NVIDIA H200 Tensor Core GPUs, which we expect will further reduce latency and enhance the responsiveness of JetBrains AI Assistant," said Uladzislau Sazanovich, Machine Learning Team Lead, JetBrains.

Google is also investing in software solutions like Hypercompute Cluster, a unified platform for AI training and inference, and Hyperdisk ML and ParallelStore, high-performance storage solutions tailored for AI and HPC workloads.

Moreover, Google Cloud is preparing infrastructure to support NVIDIA's upcoming Blackwell GB200 NVL72 GPUs, which hyperscalers are expected to adopt in early 2025. When released, these GPUs will integrate with Google's Axion-processor-based VM series, using Google's custom Arm processors.