Product2 min read
Google Launches 8th-Gen TPU Split Into Separate Training and Inference Chips
Tags Infrastructure ยท AI
CNBCยท

Google's 8th-generation TPU splits training and inference into separate specialized chips for the first time. The inference chip (TPU 8i) contains 384 megabytes of SRAM โ triple the amount in Intel's Ironwood โ designed to deliver throughput and low latency for running millions of AI agents cost-effectively. Both chips will become available later in 2026. The move follows NVIDIA's approach of using large SRAM for inference workloads and reflects the growing dominance of Google's custom silicon in AI infrastructure. Google's decision to split the TPU architecture underscores how inference workloads are becoming the primary driver of AI compute demand in the agent era.