MIT researchers explain why neural scaling laws work through representation superposition
Tags AI · Research · Models
MIT researchers (Yizhou Liu, Ziming Liu, Jeff Gore) published 'Superposition Yields Robust Neural Scaling,' a NeurIPS 2025 Best Paper Runner-up, providing a mechanistic explanation for why LLM performance scales reliably with model size. The paper demonstrates that representation superposition — where LLMs represent more features than they have dimensions — is a central driver of neural scaling laws. Under strong superposition, loss generically scales inversely with model dimension across a broad class of feature frequency distributions, due to geometric overlaps between representation vectors. The researchers confirmed that open-source LLMs operate in the strong superposition regime and that Chinchilla scaling laws are consistent with this behavior. The work suggests current scaling may be reaching efficiency limits and points toward architectures that encourage superposition as a path to smaller models matching larger ones.