Thinking Machines Lab Unveils Full-Duplex 'Interaction Models' for Real-Time AI Conversation
Tags AI · Research

Thinking Machines Lab, founded by former OpenAI CTO Mira Murati, unveiled 'interaction models' — a new class of native multimodal AI systems that process input and generate output simultaneously in full-duplex mode. The model, TML-Interaction-Small, is a 276-billion parameter Mixture-of-Experts model with 12 billion active parameters, achieving 0.40-second response times. It uses encoder-free early fusion, taking raw audio and image patches through a lightweight embedding layer, co-trained from scratch.
Technical significance
Full-duplex interaction models represent a fundamental architectural departure from the turn-based paradigm that has defined AI assistants. By processing 200ms chunks of input and output simultaneously, these models enable natural conversation with backchanneling and interruption. The 276B parameter MoE architecture with only 12B active parameters suggests efficient inference is achievable at scale.