Playing text to the user
A basic functionality of VoiceAI Connect is passing the bot's messages to the user as speech.
For bots using a textual channel, this is done by first performing text-to-speech of the bot's message using a third-party service, and then playing the synthesized speech to the user. By default all the text-to-speech responses are cached by VoiceAI Connect.
For bots using a voice channel, the speech is synthesized by the bot (or, usually, bot framework), and played to the user by VoiceAI Connect. In such cases, no caching is done by VoiceAI Connect.
The bot's messages are usually sent as a response to a user utterance (or another event that is sent to the bot). However, several bot frameworks allows sending asynchronous messages that are triggered by the bot's logic (e.g., a timer or a query that takes time). VoiceAI Connect supports both methods.
Depending on the text-to-speech provider, Speech Synthesis Markup Language (SSML) can be used in the bot textual messages to allow for more customization in the audio response, by providing details on pauses, and audio formatting for acronyms, dates, times, abbreviations, or prosody. VoiceAI Connect forwards to the text-to-speech service any SSML that is received from the bot.
How do I use it?
Each bot framework uses a different way for sending textual messages:
See Sending activities page for instructions on how to send activities using your bot framework.
Using the message
activity.
Example:
{ "type": "message", "text": "Hi." }
Using the message
activity.
Example:
{ "type": "message", "text": "Hi." }
Using an intent fulfillment with type "Agent says".
Using an intent response with type "Text Response".
Note: VoiceAI Connect only uses the responses of the DEFAULT platform.
Using the Amazon Lex V2 UI editor.
Text-to-speech caching
By default, VoiceAI Connect caches all text-to-speech responses. This cache prevent subsequent activations of the text-to-speech service in case the same text is used multiple times.
The cache size can be controlled by the administrator of the VoiceAI Connect installation.
To disable the cache for a specific response from the bot or for the whole duration of the call, the following bot configuration parameter can be used:
Parameter |
Type |
Description |
---|---|---|
Boolean |
Defines caching of text-to-speech (audio) results of bot responses.
Note: This parameter is not applicable when using a voice bot channel (i.e., speech-to-text is performed by the bot framework). |
The following example shows how your bot can disable caching of a specific response:
{ "type": "message", "text": "I have something sensitive to tell you.", "activityParams": { "disableTtsCache": true } }
{ "type": "message", "text": "I have something sensitive to tell you.", "channelData": { "activityParams": { "disableTtsCache": true } } }
Add a Custom Payload fulfillment to disable caching of a single agent response:
{ "activityParams": { "disableTtsCache": true } }
Add a Custom Payload response to disable caching of a single agent response:
{ "activityParams": { "disableTtsCache": true } }
Add a Custom payload in the message.
{ "activityParams":{ "disableTtsCache": true } }
Controlling internal audio buffer size
If there is a significant jitter in the network, increasing the buffer size between audio providers (text-to-speech or remote URL) and the SIP side can help mitigate the problem.
To increase the audio buffer size, the following bot configuration parameter can be used:
Parameter |
Type |
Description |
---|---|---|
Numeric |
Defines the maximum buffer size used between audio providers (text-to-speech or remote URL) and the SIP side. Range: 0-5000 millisecond of audio Default value: 0 (no increased buffer) This parameter is applicable only to VoiceAI Connect Enterprise Version 3.14 and later.
|
Handling text-to-speech playback error
You can define how VoiceAI Connect responds when it receives a text-to-speech playback error from the AudioCodes SBC. Only the admin can configure this parameter.
Parameter |
Type |
Description |
---|---|---|
Boolean |
Defines how VoiceAI Connect responds when it receives a text-to-speech playback error from the SBC.
This parameter is applicable only to VoiceAI Connect Enterprise Version 3.14 and later.
|
Using SSML
The bot can send Speech Synthesis Markup Language (SSML) XML elements within its textual response, in one of the following ways:
-
A full SSML document, for example:
<speak> This is <say-as interpret-as="characters">SSML</say-as>. </speak>
-
Text with SSML tags, for example:
This is <say-as interpret-as="characters">SSML</say-as>.
VoiceAI Connect adapts the received text to the way expected by the text-to-speech provider (e.g., adding the <speak>
element if needed).
The SSML is handled by the text-to-speech provider.
Refer to their documentation for a list of supported features: