VibeThinker-3B: 3B parameter model matches frontier reasoning with DeepSeek V3.2, GLM-5, Gemini 3 Pro
Tags AI · OSS
VibeThinker-3B, a 3B parameter dense model, achieves frontier-level reasoning performance scoring 94.3 on AIME26 and 80.2 Pass@1 on LiveCodeBench v6, matching or exceeding models orders of magnitude larger. The model uses curriculum SFT, multi-domain RL, and offline self-distillation. It introduces the Parametric Compression-Coverage Hypothesis, suggesting verifiable reasoning compresses into compact cores. Preprint published June 15, 2026 on arXiv (2606.16140).
Technical significance
If independently verified, VibeThinker-3B's results suggest that frontier reasoning capabilities can be compressed into extremely small models through specialized training techniques. This could dramatically reduce inference costs and enable on-device reasoning, though the training methodology requires broader validation.