Hardware3 min read

NVIDIA Rubin Platform in Full Production: 7 Chips, 10x Lower Inference Token Cost vs Blackwell

Tags AI · Infrastructure · Hardware

NVIDIA Newsroom · Tom's Hardware·January 5, 2026

NVIDIA Rubin Platform in Full Production: 7 Chips, 10x Lower Inference Token Cost vs Blackwell

NVIDIA's Vera Rubin platform is now in full production with seven co-designed chips (Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU, Spectrum-6 Ethernet Switch, and Groq 3 LPU). Rubin delivers up to 10x lower inference token cost and 4x fewer GPUs needed for MoE model training compared to Blackwell. The Rubin GPU features 336 billion transistors, 288GB HBM4 memory, and 50 PFLOPS NVFP4 inference performance. Each Rubin NVL72 rack packs 72 GPUs delivering 3.6 exaflops of inference with 20.7TB of HBM4 capacity. H2 2026 availability is confirmed from AWS, Google Cloud, Microsoft, and CoreWeave. Microsoft is deploying Vera Rubin NVL72 in its next-gen Fairwater AI superfactories.

Sources

NVIDIA Newsroom Tom's Hardware