Playing text to the user

A basic functionality of VoiceAI Connect is passing the bot's messages to the user as speech.

For bots using a textual channel, this is done by first performing text-to-speech of the bot's message using a third-party service, and then playing the synthesized speech to the user. By default all the text-to-speech responses are cached by VoiceAI Connect.

For bots using a voice channel, the speech is synthesized by the bot (or, usually, bot framework), and played to the user by VoiceAI Connect. In such cases, no caching is done by VoiceAI Connect.

The bot's messages are usually sent as a response to a user utterance (or another event that is sent to the bot). However, several bot frameworks allows sending asynchronous messages that are triggered by the bot's logic (e.g., a timer or a query that takes time). VoiceAI Connect supports both methods.

Depending on the text-to-speech provider, Speech Synthesis Markup Language (SSML) can be used in the bot textual messages to allow for more customization in the audio response, by providing details on pauses, and audio formatting for acronyms, dates, times, abbreviations, or prosody. VoiceAI Connect forwards to the text-to-speech service any SSML that is received from the bot.


How do I use it?

Each bot framework uses a different way for sending textual messages:

See Sending activities page for instructions on how to send activities using your bot framework.

AudioCodes Bot API

Using the message activity.

Example:

{
  "type": "message",
  "text": "Hi."
}
Microsoft Bot Framework

Using the message activity.

Example:

{
  "type": "message",
  "text": "Hi."
}
Dialogflow CX

Using an intent fulfillment with type "Agent says".

Dialogflow ES

Using an intent response with type "Text Response".

Note: VoiceAI Connect only uses the responses of the DEFAULT platform.

Amazon Lex V2

Using the Amazon Lex V2 UI editor.

Text-to-speech caching

By default, VoiceAI Connect caches all text-to-speech responses. This cache prevent subsequent activations of the text-to-speech service in case the same text is used multiple times.

The cache size can be controlled by the administrator of the VoiceAI Connect installation.

To disable the cache for a specific response from the bot or for the whole duration of the call, the following bot configuration parameter can be used:

Parameter

Type

Description

disableTtsCache

Boolean

Defines caching of text-to-speech (audio) results of bot responses.

  • true: Text-to-speech caching is disabled.

  • false: (Default) Text-to-speech caching is enabled.

Note: This parameter is not applicable when using a voice bot channel (i.e., speech-to-text is performed by the bot framework).

The following example shows how your bot can disable caching of a specific response:

AudioCodes Bot API
{
  "type": "message",
  "text": "I have something sensitive to tell you.",
  "activityParams": {
    "disableTtsCache": true
  }
}
Microsoft Bot Framework
{
  "type": "message",
  "text": "I have something sensitive to tell you.",
  "channelData": {
    "activityParams": {
      "disableTtsCache": true
    }
  }
}
Dialogflow CX

Add a Custom Payload fulfillment to disable caching of a single agent response:

{
  "activityParams": {
    "disableTtsCache": true
  }
}
Dialogflow ES

Add a Custom Payload response to disable caching of a single agent response:

{
  "activityParams": {
    "disableTtsCache": true
  }
}
Amazon Lex V2

Add a Custom payload in the message.

{
  "activityParams":{
    "disableTtsCache": true
  }
}

Controlling internal audio buffer size

If there is a significant jitter in the network, increasing the buffer size between audio providers (text-to-speech or remote URL) and the SIP side can help mitigate the problem.

To increase the audio buffer size, the following bot configuration parameter can be used:

Parameter

Type

Description

playMaxBufferTimeMS

Numeric

Defines the maximum buffer size used between audio providers (text-to-speech or remote URL) and the SIP side.

Range: 0-5000 millisecond of audio

Default value: 0 (no increased buffer)

This parameter is applicable only to VoiceAI Connect Enterprise Version 3.14 and later.

Handling text-to-speech playback error

You can define how VoiceAI Connect responds when it receives a text-to-speech playback error from the AudioCodes SBC. Only the admin can configure this parameter.

Parameter

Type

Description

ignorePlaybackError

Boolean

Defines how VoiceAI Connect responds when it receives a text-to-speech playback error from the SBC.

  • true: VoiceAI Connect ignores playback errors received from SBC.

  • false: (Default) If VoiceAI Connect receives a playback error from the SBC, it ends the call with the failure code SBCPlaybackError

This parameter is applicable only to VoiceAI Connect Enterprise Version 3.14 and later.

Using SSML

The bot can send Speech Synthesis Markup Language (SSML) XML elements within its textual response, in one of the following ways:

VoiceAI Connect adapts the received text to the way expected by the text-to-speech provider (e.g., adding the <speak> element if needed).

The SSML is handled by the text-to-speech provider.

Refer to their documentation for a list of supported features:

When using SSML, all invalid XML characters, like the ampersand (&), must be properly escaped.