Audio Generation
Learn how to generate audio from a text or audio prompt.
In addition to generating text and images, some models enable you to generate a spoken audio response to a prompt, and to use audio inputs to prompt the model. Audio inputs can contain richer data than text alone, allowing the model to detect tone, inflection, and other nuances within the input. [!] OpenAI provides other models for simple speech to text and text to speech - when your task requires those conversions (and not dynamic content from a model), the TTS and STT models will be more performant and cost-efficient.
You can use these audio capabilities to:
Generate a spoken audio summary of a body of text (text in, audio out)
Perform sentiment analysis on a recording (audio in, text out)
Async speech to speech interactions with a model (audio in, audio out)
Quickstart
To generate audio or use audio as an input, you can use the chat completions endpoint in the REST API, as seen in the examples below. You can either use the REST API from the HTTP client of your choice, or use one of OpenAI's official SDKs for your preferred programming language.
The value of message.audio.id
above provides an identifier you can use in an assistant
message for a new /chat/completions
request, as in the example below.
FAQ
What modalities are supported by gpt-4o-audio-preview
gpt-4o-audio-preview
requires either audio output or audio input to be used at this time. Acceptable combinations of input and output are:
text in → text + audio out
audio in → text + audio out
audio in → text out
text + audio in → text + audio out
text + audio in → text out
How is audio in Chat Completions different from the Realtime API?
The underlying GPT-4o audio model is exactly the same. The Realtime API operates the same model at lower latency.
How do I think about audio input to the model in terms of tokens?
OpenAI is working on better tooling to expose this, but roughly one hour of audio input is equal to 128k tokens, the max context window currently supported by this model.
Last updated