OpenAI Unveils Real-time Audio Models

OpenAI has released three new audio models via its Realtime API: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper, enhancing capabilities in voice reasoning, translation, and transcription.

Published May 9, 2026, 3:31 AMUpdated May 9, 2026, 3:31 AM

What happened

OpenAI has launched three new models for real-time audio processing: GPT-Realtime-2 for advanced voice reasoning, GPT-Realtime-Translate for speech translation across 70+ languages, and GPT-Realtime-Whisper for live transcription.

[1]

Why it matters

The release marks a significant step forward in real-time voice applications, enabling seamless integration of reasoning, translation, and transcription in conversational AI systems.

[1]

Who is affected

Developers and businesses working on voice-based applications stand to benefit from the enhanced features of the new models, which can transform customer interactions and automate complex workflows.

[1]

Risks / uncertainty

There are potential uncertainties around the accuracy of these models in diverse real-world applications and varying speech patterns across languages and regions.

[1]