AMD MI355X Delivers 80% of Nvidia Blackwell Performance at 2x Lower Hardware Cost
Tags Hardware · Infrastructure · AI
AMD MI355X inference accelerators achieved approximately 80% of Nvidia B200 throughput at over twice the lower hardware cost in Wafer benchmarks. Testing on GLM-5.2 showed 2,626 tokens per second per node with the AMD platform. P50 time-to-first-token was 0.81 seconds with the AMD setup, compared to 2.22 seconds at p95 saturation. Single-stream performance reached 213 tokens per second with 10K input and 1,500 output tokens.
Technical significance
Cost-performance advantage challenges Nvidia's AI inference dominance. Organizations deploying large language models can reduce capital expenditure by switching to AMD, though requires ROCm stack engineering for quantization and kernel tuning. Price-performance gap may accelerate inference-optimized hardware commoditization.