Thinking Machines Lab Debuts Full-Duplex 'Interaction Models' with 0.4s Response Latency
Tags AI · Infrastructure
Thinking Machines Lab, founded by former OpenAI CTO Mira Murati, announced a research preview of TML-Interaction-Small, a 276-billion parameter Mixture-of-Experts model (12B active) that processes audio, video, and text simultaneously and responds in 0.40 seconds — comparable to natural human conversation speed. The model uses a multi-stream, micro-turn architecture that processes input and generates output concurrently, unlike the sequential listen-then-respond approach of existing systems. On the FD-bench V1.5 interaction quality benchmark, it scores 77.8 versus 46.8 for GPT-Realtime-2 and 54.3 for Gemini 3.1 Flash Live. A limited research preview is expected in coming months, with wider release later in 2026.
Technical significance
Full-duplex interaction represents a fundamental architectural shift from sequential to concurrent processing, which could redefine voice and video AI interfaces. If the 0.4s latency holds in production, it would enable genuinely natural real-time collaboration between humans and AI, with implications for customer service, education, and accessibility. The 276B MoE with only 12B active parameters also demonstrates that sparse architectures can deliver real-time performance without proportional compute costs.