Text-to-speech service information

To connect VoiceAI Connect to a text-to-speech service provider, certain information is required from the provider, which is then used in the VoiceAI Connect configuration for the bot.

Microsoft Azure Speech Services

Connectivity

To connect to Azure's Speech Service, you need to provide AudioCodes with your subscription key for the service. To obtain the key, see Azure's documentation.

The key is configured on VoiceAI Connect using the credentials > key parameter in the providers section.

Note: The key is valid only for a specific region. The region is configured using the region parameter.

Language Definition

To define the language, you need to provide AudioCodes with the following from Azure's Text-to-Speech table:

The 'Local' and 'Voice name' values are configured on VoiceAI Connect using the language and voiceName parameters, respectively. For example, for Italian, the language parameter should be configured to it-IT and the voiceName parameter to it-IT-ElsaNeural.

Customized neural voice

If you have defined a customized synthetic voice with Azure's Custom Neural Voice feature, you need to configure VoiceAI Connect using the ttsDeploymentId parameter to identify the associated speech-to-text endpoint.

For VoiceAI Connect Enterprise, this feature is supported only from Version 2.6 and later

Custom subdomain names for Azure Cognitive Services

Azure Cognitive Services provides a layered security model. This model enables you to secure your Cognitive Services accounts to a specific subset of networks. For more details, click here.

This section describes how to use custom subdomain and limit network access with VoiceAI Connect.

Creating Custom Domain

1. In Microsoft Azure’s Speech Services > [service name], select Networking (under Resource Management).

2. Click Generate Custom Domain Name.

3. Type in a custom domain name.

4. Click Save.

The custom domain name will also appear in Microsoft Azure’s Cognitive Services > Speech service.

Allowing only Selected Networks

1. In Microsoft Azure’s Cognitive Services > Speech service., select the service name you created.

2. Click Networking (under Resource Management).

3. If not already selected, select Firewalls and virtual networks tab.

4. Select Selected Networks and Private Endpointss.

5. Add virtual networks, or external IP addresses.

Setting up Private endpoint connections

For VoiceAIConnect Enterprise, this feature is supported only from Version 3.4 and later.

1. In Microsoft Azure’s Cognitive Services > Speech service, select the custom domain you created.

2. Click Networking (under Resource Management).

3. Select Private endpoint connections tab and create a private endpoint. For more details, click here.

4. In the provider, configure azureCustomSubdomain to the domain name (not FQDN, only the name) and set azureIsPrivateEndpoint to true.

Parameter azureIsPrivateEndpoint was deprecated in version 3.10.1.

Google Cloud Text-to-Speech

Connectivity

To connect to Google Cloud Text-to-Speech service, you need to provide AudioCodes with the following:

Configuration

The keys are configured on VoiceAI Connect using the privateKey and clientEmail parameters in the providers > credentials section. To create the account key, refer to Google's documentation. From the JSON object representing the key, extract the private key (including the "-----BEGIN PRIVATE KEY-----" prefix) and the service account email. These two values must be configured on VoiceAI Connect using the privateKey and clientEmail parameters.

Language Definition

To define the language, you need to provide AudioCodes with the following from Google's Supported voices and languages table:

The 'Language code' and 'Voice name' values are configured on VoiceAI Connect using the language and voiceName parameters, respectively. For example, for English (US), the language parameter should be configured to en-US and the voiceName parameter to en-US-Wavenet-A.

AWS Amazon Polly

Connectivity

To connect to Amazon Polly Text-to-Speech service, see Text-to-speech service information for required information.

Language Definition

To define the language, you need to provide AudioCodes with the following information from Voices in Amazon Polly table:

The 'Language' and 'Name/ID' values are configured on VoiceAI Connect using the language and voiceName parameters, respectively. For example, for English (US), the language parameter should be configured to English, US (en-US) and the voiceName parameter to Matthew.

The usage of 'Neural Voice' or 'Standard Voice' is configured on VoiceAI Connect using the ttsEnhancedVoice parameter. Refer to the Voices in Amazon Polly table to check if the specific language voice supports Neural Voice and/or Standard Voice.

For VoiceAI Connect Enterprise, the ttsEnhancedVoice parameter (neural voices) is supported from Version 3.0 and later.

Nuance

Connectivity

To connect VoiceAI Connect to Nuance Vocalizer for Cloud (NVC) speech service, it can use the WebSocket API or the open source Remote Procedure Calls (gRPC) API. To connect to Nuance Mix, it must use the gRPC API.

VoiceAI Connect is configured to connect to the specific Nuance API type, by setting the type parameter in the providers section, to nuance or nuance-grpc.

You need to provide AudioCodes with the URL of your Nuance's text-to-speech endpoint instance. This URL (with port number) is configured on the VoiceAI Connect using the ttsHost parameter.

Note: Nuance offers a cloud service (Nuance Mix) as well as an option to install an on-premise server. The on-premise server is without authentication while the cloud service uses OAuth 2.0 authentication (see below).

VoiceAI Connect supports Nuance Mix, Nuance Conversational AI services (gRPC) API interfaces. VoiceAI Connect authenticates itself with Nuance Mix (which is located in the public cloud), using OAuth 2.0. To configure OAuth 2.0, use the following providers parameters: oauthTokenUrl, credentials > oauthClientId, and credentials > oauthClientSecret.

Nuance Mix is supported only by VoiceAI Connect Enterprise from Version 2.6 and later.

Language Definition

To define the language, you need to provide AudioCodes with the following from Nuance's Vocalizer Language Availability table:

The 'Language' and 'Voice' values are configured on VoiceAI Connect using the language and voiceName parameters, respectively. For example, for English (US), the language parameter should be configured to en-US and the voiceName parameter to Kate.

To define the language, you need to provide AudioCodes with the language code from Nuance.

This value (ISO 639-1 format) is configured on VoiceAI Connect using the language parameter. For example, for English (USA), the parameter should be configured to en-US.

ReadSpeaker

Connectivity

To connect to ReadSpeaker Text-to-Speech service, enter the information provided by your ReadSpeaker account manager upon delivery of the service.

For authentication, enter the entire string of the authorization key as provided by ReadSpeaker. This value is a combination of both your ReadSpeaker account ID and a private key (e.g., "1234.abcdefghijklmnopqrstuvxuz1234567"). This value should be configured in VoiceAI Connect using the credentials > key parameter under the providers section.

To request a new key, contact the ReadSpeaker support team or your ReadSpeaker account manager.

The endpoint value to use in your AudioCodes implementation is provided by the ReadSpeaker team.

Language Definition

To define the language and voice, enter the language and voice values as provided by ReadSpeaker.

For language and voice, the following needs to be defined:

The 'Language' and 'Voice' values are configured on VoiceAI Connect using the language and voiceName parameters, respectively. For example: for English (US) using the voice Paul, the language parameter should be configured to English, US (en-US), and the voiceName parameter to Paul.

Yandex

To connect to Yandex, contact AudioCodes for information.

ElevenLabs

For VoiceAI Connect Enterprise, this feature is supported only from Version 3.22 and later

Connectivity

To connect VoiceAI Connect to ElevenLabs text-to-speech service, you need to provide AudioCodes with the following from ElevenLabs:

This key must be configured on VoiceAI Connect using the credentials > key parameter under the providers section.

The provider type under the providers section must be configured to elevenlabs.

For example:

{
  "name": "my_elevenlabs",
  "type": "elevenlabs",
  "credentials": {
    "key": "api key from elevenlabs"
  }
}

Language Definition

To define the language and voice, enter the language and voice values as provided by ElevenLabs.

The 'voice-id' value is configured on VoiceAI Connect using the voiceName parameter.

The 'model-id' is configured on VoiceAI Connect using the ttsModel parameter.

Advanced Parameters

ElevenLabs advanced parameters can be added in the provider section. For a list of the advanced parameters, refer to ElevenLabs documentation.

Example of advanced configuration:

{
  "ttsOverrideConfig": {
    "query": {
      "optimize_streaming_latency": 2
    },
    "body": {
      "voiceSettings": {
        "stability": 3
      }
    }
  }
}

Deepgram

Connectivity

To connect VoiceAI Connect with Deepgram's text-to-speech service, you need to provide AudioCodes with the following:

This key must be configured on VoiceAI Connect using the credentials > key parameter under the providers section.

The provider type under the providers section must be configured to deepgram.

For example:

{
  "name": "my_deepgram",
  "type": "deepgram",
  "credentials": {
    "key": "API key from Deepgram"
  }
}

The default URL to Deepgram's API is api.deepgram.com. However, you can override this URL using the ttsHost parameter.

Voice Definition

Deepgram offers various text-to-speech voices, as listed here under the Values column (for example, "aura-asteria-en" for an English US female voice). This is configured on VoiceAI Connect using the voiceName parameter.

Connecting Deepgram using AudioCodes Live Hub

If you want to connect to Deepgram's speech services using AudioCodes Live Hub:

  1. Sign into the Live Hub portal.

  2. From the Navigation Menu pane, click Speech Services.

  3. Click Add new speech service button, and then do the following:

    1. In the 'Speech service name' field, type a name for the speech service.

    2. Select only the Text to Speech check box.

    3. Select the Generic provider option.

    4. Click Next.

  4. In the 'Authentication Key' field, enter the token supplied by Deepgram.

  5. In the 'Text to Speech (TTS) URL' field, enter the URL supplied by Deepgram.

  6. Click Create.

Almagu

Connectivity

To connect to Almagu, contact AudioCodes for information.

Language Definition

To define the language, you need to provide AudioCodes with the following from Almagu documentation:

The 'Voice' value is configured on VoiceAI Connect using the language parameter.