Real-time models
Real-time models are designed to process and respond to inputs with minimal latency, enabling near-instantaneous interactions. They support multi-model communication through both voice and text interfaces, providing minimal latency and creating seamless human-AI interactions.
AI Agents support the following real-time models:
Provider |
Models |
---|---|
OpenAI |
|
Azure OpenAI |
|
Amazon |
|
|
|
You may use pre-deployed models or bring your own API keys from OpenAI or Microsoft Azure.
Configuring real-time models
To configure agent to use real-time model:
-
Navigate to the Agents screen, locate your Agent, and click Edit.
-
In General tab:
-
From the 'Large language model' drop-down, choose the real-time model.
-
Set the number of 'Max output tokens' to 1,000 or larger.
-
-
In Speech and Telephony tab, check the Enable voice streaming check box.
The above-described configuration makes real-time model interact with user via voice modality. Input audio stream is directly sent to the model. And the model generates audio stream in response. Correspondingly Speech-to-Text and Text-to-Speech services are irrelevant in this interaction mode.
If you clear Enable voice streaming check box, real-time model will be used via the text modality. In this interaction mode you will need to choose Speech-to-Text and Text-to-Speech services – same as for regular LLM models.
Chats always use text modality, regardless of the Enable voice streaming check box state.
Customizing the real-time model behavior
Use the following advanced configuration parameters to customize the real-time model behavior:
-
openai_realtime
– for OpenAI models -
gemini_audio
– for Gemini models -
nova_sonic
– for Amazon model
For example:
{ "openai_realtime": { "voice": "coral" } }
Configuring input / output language
Real-time models lack explicit configuration for input / output language. Instead you should include relevant instructions in your agent’s prompt.
For example:
Always respond to user in German.
Feature Parity
Agents that use real-time models benefit from most of the AI Agent platform features, including but not limited to:
-
Documents
-
Tools
-
Multi-agent topologies
-
Post call analysis
-
Webhooks
The following limitations apply to the agents that use real-time models:
-
RAG for every query
document mode is not supported and implicitly switched toRAG via doc_search tool
mode. -
For multi-agent topologies real-time model should be configured for the “main” agent (that starts the conversation). This model will be used for the complete conversation and LLM configuration in sub-agents will be ignored.
-
Real-time models generate significantly more output tokens than regular models, because they output audio stream. Therefore, you will typically need to increase the Max output tokens parameter in your agent configuration screen to a value of 1,000 or larger.
-
Static welcome messages are not supported for real-time Gemini and Amazon models.
-
The following advanced configuration parameters are not supported when real-time models are used:
-
call_transfer_conditions
-
language_detected_pass_question
-
language_detected_ignore_phrases
-
remove_symbols
-
replace_words
-
activity_params
-
session_params
-