Research3 min read
NIST CAISI Evaluation Finds DeepSeek V4 Pro Lags US AI Frontier by Approximately 8 Months
Tags AI · Models · Research · Policy
NIST · TechCrunch·

NIST's Center for AI Standards and Innovation (CAISI) independently evaluated DeepSeek V4 Pro and found it lags leading US models by about 8 months in aggregate capability across 5 domains (cyber, software engineering, natural sciences, abstract reasoning, mathematics) and 9 benchmarks. On held-out benchmarks not available during training, the gap was pronounced: DeepSeek scored 46% on ARC-AGI-2 versus GPT-5.5's 79%, and 44% on PortBench versus 78%. DeepSeek V4 Pro is a 1.6 trillion parameter model (49 billion active), the largest open-weight model available. CAISI noted DeepSeek V4 Pro is more cost-efficient than comparable US models, costing less than GPT-5.4 mini on 5 of 7 benchmarks tested.