Streaming mode

The streaming mode of AudioCodes Bot API is a WebSocket-based API that allows a voice bot to communicate with the AudioCodes VoiceAI Connect Enterprise platform.

A typical use case for voice bots is implementing LLM AI agents.

This page describes the Voice Bot API. For the textual Bot API, see Chat mode.

API Details

WebSocket Connection

At the start of a call, VoiceAI Connect Enterprise initiates a WebSocket connection with the bot using a predefined URL.

This connection remains active for the entire duration of the conversation.

It is used to send and receive messages for a single conversational session with the bot.

Authentication

An HTTP Authorization header is sent by VoiceAI Connect Enterprise on the creation of the WebSocket connection, containing a shared token. The token can be used by the bot server to authenticate and authorize the client.

OAuth 2.0 authentication is also supported, as described OAuth 2.0 Authentication.

Example:

Authorization: Bearer <token>

Messages

All the messages transmitted over the WebSocket are JSON encoded, with media encoded in base64 within the JSON messages.

All the messages to and from the bot must contain the following shared field:

type (string): The type of the message.

In addition, all the messages sent to the bot include the following field:

conversationId (string): The unique identifier of the conversation.

Configuration

VoiceAI Connect Enterprise should be configured with a provider of type ac-api, specifying the URL of the bot in the botUrl parameter.

To enable using the AudioCodes Voice Bot API (instead of the textual AudioCodes Bot API), set the directSTT bot parameter to true.

To enable direct voice playback towards the user (without text-to-speech engine), set the directTTS bot parameter to true.

For bearer token authentication, set the token provider parameter to the token value.

API Messages

This section describes the API between the VoiceAI Connect Enterprise (client) and bot (server).

Messages from VoiceAI Connect Enterprise to bot

session.initiate

Sent upon establishment of the session.

The bot should respond to this message with a session.accepted message.

If the bot wishes to decline the conversation, it should respond with a session.error message.

Parameters:

conversationId (string): A unique identifier for the conversation.
type (string): The value "session.initiate".
botName (string): The configured name of the bot.
caller (string): The phone number of the caller.
expectAudioMessages (boolean): Whether the bot is expected to playback audio messages. If true, the bot must not send message activities with textual prompts. Note: This field is set solely according to the directTTS bot parameter.
supportedMediaFormats (array of strings): List of the supported audio coders, ordered by preference.

Supported media formats:

raw/mulaw: Mu-Law encoded (8 bit, 8 kHz) without a header
wav/mulaw: Mu-Law encoded (8 bit, 8 kHz) with a WAV header
raw/lpcm16: Linear PCM (16 bit, 16 kHz) without a header
wav/lpcm16: Linear PCM (16 bit, 16 kHz) with a WAV header
raw/lpcm16_8: Linear PCM (16 bit, 8 kHz) without a header
wav/lpcm16_8: Linear PCM (16 bit, 8 kHz) with a WAV header

Example:

{
  "conversationId": "4a5b4b9d-dab7-42d0-a977-6740c9349588",
  "type": "session.initiate",
  "botName": "my_bot_name",
  "caller": "+1234567890",
  "expectAudioMessages": true,
  "supportedMediaFormats": [
    "raw/lpcm16"
  ]
}

activities

VoiceAI Connect Enterprise activities are sent to the bot using the activities message, with an activities list.

The following sections show several activities for example.

Call initiation

The start activity is sent when the call starts, as described Call initiation.

Example:

{
  "conversationId": "4a5b4b9d-dab7-42d0-a977-6740c9349588",
  "type": "activities",
  "activities": [
    {
      "type": "event",
      "name": "start",
      "id": "582bbc43-0ef7-47e9-97b4-1e6141625b01",
      "timestamp": "2022-07-20T07:15:48.239Z",
      "language": "en-US",
      "parameters": {
        "locale": "en-US",
        "caller": "caller-id",
        "callee": "my_bot_name"
      }
    }
  ]
}

DTMF

The dtmf activity is sent when the user presses a DTMF (dual-tone multi-frequency) digit, as described Receiving DTMF digits notification.

Example:

{
  "conversationId": "4a5b4b9d-dab7-42d0-a977-6740c9349588",
  "type": "activities",
  "activities": [
    {
      "type": "event",
      "name": "dtmf",
      "id": "582bbc43-0ef7-47e9-97b4-1e6141625b01",
      "timestamp": "2022-07-20T07:15:48.239Z",
      "language": "en-US",
      "value": "123"
    }
  ]
}

userStream.start

A userStream.start message is sent by VoiceAI Connect Enterprise to indicate a request to start audio streaming to the bot.

The bot should respond to this message with a userStream.started message. After receiving this response message, VoiceAI Connect Enterprise starts sending the audio chunks using userStream.chunk messages.

Example:

{
  "conversationId": "4a5b4b9d-dab7-42d0-a977-6740c9349588",
  "type": "userStream.start"
}

userStream.chunk

A userStream.chunk message is sent by VoiceAI Connect Enterprise to stream audio data to the bot.

Example:

{
  "conversationId": "4a5b4b9d-dab7-42d0-a977-6740c9349588",
  "type": "userStream.chunk",
  "audioChunk": "Base64EncodedAudioData"
}

userStream.stop

A userStream.stop message is sent by VoiceAI Connect Enterprise to indicate the end of audio streaming.

The bot should respond to this message with a userStream.stopped message.

Example:

{
  "conversationId": "4a5b4b9d-dab7-42d0-a977-6740c9349588",
  "type": "userStream.stop"
}

session.resume

In case the WebSocket connection is lost, VoiceAI Connect Enterprise attempts to reconnect and send a session.resume message to the bot.

The bot should respond to this message with a session.accepted message.

If the bot wishes to decline the reconnection, it should respond with a session.error message.

Example:

{
  "conversationId": "4a5b4b9d-dab7-42d0-a977-6740c9349588",
  "type": "session.resume"
}

session.end

A session.end message is sent by VoiceAI Connect Enterprise to indicate the end of the conversation.

Example:

{
  "conversationId": "4a5b4b9d-dab7-42d0-a977-6740c9349588",
  "type": "session.end",
  "reasonCode": "client-disconnected",
  "reason": "Client Side"
}

Messages from bot to VoiceAI Connect Enterprise

session.accepted

The session.accepted message is sent in response to either the session.initiate or session.resume messages.

Parameters:

mediaFormat (string): The chosen media format. Must be one of the supported media formats specified in the session.initiate message.

Example:

{
  "type": "session.accepted",
  "mediaFormat": "raw/lpcm16"
}

userStream.started

The userStream.started message is sent in response to the userStream.start message, to indicate that the bot is ready to receive audio chunks.

Example:

{
  "type": "userStream.started"
}

userStream.stopped

The userStream.stopped message is sent in response to the userStream.stop message, to indicate that the bot will not accept any more audio chunks.

Example:

{
  "type": "userStream.stopped"
}

userStream.speech.hypothesis

The userStream.speech.hypothesis message is sent by the bot to provide partial recognition results.

Using this message is recommended, as VoiceAI Connect Enterprise relies on it for performing barge-in.

Parameters:

alternatives (array of objects): A list of recognition alternatives.
text (string): The recognized text.

Example:

{
  "type": "userStream.speech.hypothesis",
  "alternatives": [
    {
      "text": "How are"
    }
  ]
}

userStream.speech.recognition

The userStream.speech.recognition message is sent by the bot to provide the final recognition result.

Using this message is recommended mainly for logging purposes.

Parameters:

alternatives (array of objects): A list of recognition alternatives.
text (string): The recognized text.
confidence (number): The confidence level of the recognition result, between 0 and 1.

Example:

{
  "type": "userStream.speech.recognition",
  "alternatives": [
    {
      "text": "How are you.",
      "confidence": 0.83
    }
  ]
}

activities

The bot can send activities to VoiceAI Connect Enterprise using the activities message, containing an activities list.

Playing an audio buffer to the user

For playing an audio buffer to the user, the bot should send a playUrl activity with the playUrlUrl field containing the audio data as a Data URI.

The Data URI contains the audio in base64 encoding with the prefix data:audio/wav;base64, or data:application/octet-stream;base64, depending on whether a WAV header is used.

The playUrlMediaFormat field must contain the media format of the audio data.

The playUrlAltText field is optional and can be used to provide the corresponding text of the audio.

Example:

{
  "type": "activities",
  "activities": [
    {
      "type": "event",
      "name": "playUrl",
      "activityParams": {
        "playUrlAltText": "Welcome to our example",
        "playUrlUrl": "data:audio/wav;base64,UklGRmK4AABXQVZFZm10IBIAAAAGAA...",
        "playUrlMediaFormat": "wav/lpcm16"
      }
    }
  ]
}

Disconnecting the call

To disconnect the call, the bot should send a hangup activity, as described Disconnecting the call.

Note: Closing the WebSocket connection doesn't disconnect the call, as VoiceAI Connect Enterprise attempts to reconnect. When processing the hangup activity, VoiceAI Connect Enterprise sends a session.end message, disconnects the call, and closes the WebSocket connection.

Example:

{
 "type": "activities",
  "activities": [
    {
      "type": "event",
      "name": "hangup"
    }
  ]
}

session.error

The session.error message is sent by the bot to report a fatal error. Upon receiving this message, VoiceAI Connect Enterprise disconnects the call and closes the WebSocket connection.

Parameters:

reason (string): The error message.

Example:

{
  "type": "session.error",
  "reason": "Internal Server Error"
}

Streaming mode

API Details

WebSocket Connection

Authentication

Messages

Configuration

API Messages

Messages from VoiceAI Connect Enterprise to bot

session.initiate

activities

Call initiation

DTMF

userStream.start

userStream.chunk

userStream.stop

session.resume

session.end

Messages from bot to VoiceAI Connect Enterprise

session.accepted

userStream.started

userStream.stopped

userStream.speech.hypothesis

userStream.speech.recognition

activities

Playing an audio buffer to the user

Disconnecting the call

session.error

Example call flow