Streaming mode
The streaming mode of AudioCodes Bot API is a WebSocket-based API that allows a voice bot to communicate with the AudioCodes VoiceAI Connect Enterprise platform.
A typical use case for voice bots is implementing LLM AI agents.
API Details
WebSocket Connection
At the start of a call, VoiceAI Connect Enterprise initiates a WebSocket connection with the bot using a predefined URL.
This connection remains active for the entire duration of the conversation.
It is used to send and receive messages for a single conversational session with the bot.
Authentication
An HTTP Authorization header is sent by VoiceAI Connect Enterprise on the creation of the WebSocket connection, containing a shared token. The token can be used by the bot server to authenticate and authorize the client.
OAuth 2.0 authentication is also supported, as described OAuth 2.0 Authentication.
Example:
Authorization: Bearer <token>
Messages
All the messages transmitted over the WebSocket are JSON encoded, with media encoded in base64 within the JSON messages.
All the messages to and from the bot must contain the following shared field:
-
type
(string): The type of the message.
In addition, all the messages sent to the bot include the following field:
-
conversationId
(string): The unique identifier of the conversation.
Configuration
VoiceAI Connect Enterprise should be configured with a provider of type ac-api
, specifying the URL of the bot in the botUrl
parameter.
To enable using the AudioCodes Voice Bot API (instead of the textual AudioCodes Bot API), set the directSTT
bot parameter to true
.
To enable direct voice playback towards the user (without text-to-speech engine), set the directTTS
bot parameter to true
.
For bearer token authentication, set the token
provider parameter to the token value.
API Messages
This section describes the API between the VoiceAI Connect Enterprise (client) and bot (server).
Messages from VoiceAI Connect Enterprise to bot
session.initiate
Sent upon establishment of the session.
The bot should respond to this message with a session.accepted
message.
If the bot wishes to decline the conversation, it should respond with a session.error
message.
Parameters:
-
conversationId
(string): A unique identifier for the conversation. -
type
(string): The value "session.initiate". -
botName
(string): The configured name of the bot. -
caller
(string): The phone number of the caller. -
expectAudioMessages
(boolean): Whether the bot is expected to playback audio messages. Iftrue
, the bot must not sendmessage
activities with textual prompts. Note: This field is set solely according to thedirectTTS
bot parameter. -
supportedMediaFormats
(array of strings): List of the supported audio coders, ordered by preference.
Supported media formats:
-
raw/mulaw
: Mu-Law encoded (8 bit, 8 kHz) without a header -
wav/mulaw
: Mu-Law encoded (8 bit, 8 kHz) with a WAV header -
raw/lpcm16
: Linear PCM (16 bit, 16 kHz) without a header -
wav/lpcm16
: Linear PCM (16 bit, 16 kHz) with a WAV header -
raw/lpcm16_8
: Linear PCM (16 bit, 8 kHz) without a header -
wav/lpcm16_8
: Linear PCM (16 bit, 8 kHz) with a WAV header
Example:
{ "conversationId": "4a5b4b9d-dab7-42d0-a977-6740c9349588", "type": "session.initiate", "botName": "my_bot_name", "caller": "+1234567890", "expectAudioMessages": true, "supportedMediaFormats": [ "raw/lpcm16" ] }
activities
VoiceAI Connect Enterprise activities are sent to the bot using the activities
message, with an activities list.
The following sections show several activities for example.
Call initiation
The start
activity is sent when the call starts, as described Call initiation.
Example:
{ "conversationId": "4a5b4b9d-dab7-42d0-a977-6740c9349588", "type": "activities", "activities": [ { "type": "event", "name": "start", "id": "582bbc43-0ef7-47e9-97b4-1e6141625b01", "timestamp": "2022-07-20T07:15:48.239Z", "language": "en-US", "parameters": { "locale": "en-US", "caller": "caller-id", "callee": "my_bot_name" } } ] }
DTMF
The dtmf
activity is sent when the user presses a DTMF (dual-tone multi-frequency) digit, as described Receiving DTMF digits notification.
Example:
{ "conversationId": "4a5b4b9d-dab7-42d0-a977-6740c9349588", "type": "activities", "activities": [ { "type": "event", "name": "dtmf", "id": "582bbc43-0ef7-47e9-97b4-1e6141625b01", "timestamp": "2022-07-20T07:15:48.239Z", "language": "en-US", "value": "123" } ] }
userStream.start
A userStream.start
message is sent by VoiceAI Connect Enterprise to indicate a request to start audio streaming to the bot.
The bot should respond to this message with a userStream.started
message. After receiving this response message, VoiceAI Connect Enterprise starts sending the audio chunks using userStream.chunk
messages.
Example:
{ "conversationId": "4a5b4b9d-dab7-42d0-a977-6740c9349588", "type": "userStream.start" }
userStream.chunk
A userStream.chunk
message is sent by VoiceAI Connect Enterprise to stream audio data to the bot.
Example:
{ "conversationId": "4a5b4b9d-dab7-42d0-a977-6740c9349588", "type": "userStream.chunk", "audioChunk": "Base64EncodedAudioData" }
userStream.stop
A userStream.stop
message is sent by VoiceAI Connect Enterprise to indicate the end of audio streaming.
The bot should respond to this message with a userStream.stopped
message.
Example:
{ "conversationId": "4a5b4b9d-dab7-42d0-a977-6740c9349588", "type": "userStream.stop" }
session.resume
In case the WebSocket connection is lost, VoiceAI Connect Enterprise attempts to reconnect and send a session.resume
message to the bot.
The bot should respond to this message with a session.accepted
message.
If the bot wishes to decline the reconnection, it should respond with a session.error
message.
Example:
{ "conversationId": "4a5b4b9d-dab7-42d0-a977-6740c9349588", "type": "session.resume" }
session.end
A session.end
message is sent by VoiceAI Connect Enterprise to indicate the end of the conversation.
Example:
{ "conversationId": "4a5b4b9d-dab7-42d0-a977-6740c9349588", "type": "session.end", "reasonCode": "client-disconnected", "reason": "Client Side" }
Messages from bot to VoiceAI Connect Enterprise
session.accepted
The session.accepted
message is sent in response to either the session.initiate
or session.resume
messages.
Parameters:
-
mediaFormat
(string): The chosen media format. Must be one of the supported media formats specified in thesession.initiate
message.
Example:
{ "type": "session.accepted", "mediaFormat": "raw/lpcm16" }
userStream.started
The userStream.started
message is sent in response to the userStream.start
message, to indicate that the bot is ready to receive audio chunks.
Example:
{ "type": "userStream.started" }
userStream.stopped
The userStream.stopped
message is sent in response to the userStream.stop
message, to indicate that the bot will not accept any more audio chunks.
Example:
{ "type": "userStream.stopped" }
userStream.speech.hypothesis
The userStream.speech.hypothesis
message is sent by the bot to provide partial recognition results.
Using this message is recommended, as VoiceAI Connect Enterprise relies on it for performing barge-in.
Parameters:
-
alternatives
(array of objects): A list of recognition alternatives. -
text
(string): The recognized text.
Example:
{ "type": "userStream.speech.hypothesis", "alternatives": [ { "text": "How are" } ] }
userStream.speech.recognition
The userStream.speech.recognition
message is sent by the bot to provide the final recognition result.
Using this message is recommended mainly for logging purposes.
Parameters:
-
alternatives
(array of objects): A list of recognition alternatives. -
text
(string): The recognized text. -
confidence
(number): The confidence level of the recognition result, between 0 and 1.
Example:
{ "type": "userStream.speech.recognition", "alternatives": [ { "text": "How are you.", "confidence": 0.83 } ] }
activities
The bot can send activities to VoiceAI Connect Enterprise using the activities
message, containing an activities list.
Playing an audio buffer to the user
For playing an audio buffer to the user, the bot should send a playUrl
activity with the playUrlUrl
field containing the audio data as a Data URI.
The Data URI contains the audio in base64 encoding with the prefix data:audio/wav;base64
, or data:application/octet-stream;base64
, depending on whether a WAV header is used.
The playUrlMediaFormat
field must contain the media format of the audio data.
The playUrlAltText
field is optional and can be used to provide the corresponding text of the audio.
Example:
{ "type": "activities", "activities": [ { "type": "event", "name": "playUrl", "activityParams": { "playUrlAltText": "Welcome to our example", "playUrlUrl": "data:audio/wav;base64,UklGRmK4AABXQVZFZm10IBIAAAAGAA...", "playUrlMediaFormat": "wav/lpcm16" } } ] }
Disconnecting the call
To disconnect the call, the bot should send a hangup
activity, as described Disconnecting the call.
hangup
activity, VoiceAI Connect Enterprise sends a session.end
message, disconnects the call, and closes the WebSocket connection.Example:
{ "type": "activities", "activities": [ { "type": "event", "name": "hangup" } ] }
session.error
The session.error
message is sent by the bot to report a fatal error. Upon receiving this message, VoiceAI Connect Enterprise disconnects the call and closes the WebSocket connection.
Parameters:
-
reason
(string): The error message.
Example:
{ "type": "session.error", "reason": "Internal Server Error" }
Example call flow