Text-to-Speech API

A third-party text-to-speech vendor (except the commonly used ones such as Azure, AWS, and Google) can integrate with an API that VoiceAI Connect Enterprise exposes as a text-to-speech client. The client (VoiceAI Connect Enterprise) sends an HTTP POST request to a pre-defined URL.

An Authorization header is sent by the client in the HTTP request, containing a shared token. The token can be used by the text-to-speech server to identify the client, for example:

Authorization: Bearer <token> 
Request Body Attributes

Parameter

Type

Description

language

String

Defines the BCP-47 language code for speech recognition of the supplied audio.

format

String

Defines the format of the audio file (configured by the ttsPreferWave parameter):

  • raw: Audio without headers

  • wav: Audio with WAV headers

encoding

String

Defines how the audio is stored and transmitted. Currently, only 16-bit linear pulse-code modulation (PCM) encoding (LINEAR16) is supported.

sampleRateHz

Number

Defines the sample rate (in Hertz) of the supplied audio. Currently, only 16,000 Hz is supported.

voice

String

Defines the name of the voice used for speech synthesis.

type

String

Defines the type of text. If it contains SSML, the type is set to ssml.

text

String

Defines the text to synthesize.

Response Body Attributes

In case of a success, the text-to-speech server replies with a 200 OK response, containing a body with the synthesized speech. In case of failure, the server replies with an HTTP error code.

Example

Example 1:

{
  "language": "en-US",
  "format": "wav",
  "encoding": "LINEAR16",
  "sampleRateHz": 16000,
  "voice": "SomeVoiceName",
  "text": "Text to be played"
}

Example 2:

{
  "language": "en-US",
  "format": "wav",
  "encoding": "LINEAR16",
  "sampleRateHz": 16000,
  "voice": "SomeVoiceName",
  "type": "ssml",
  "text": "<speak><say-as interpret-as=\"ordinal\">1</say-as></speak>"
}

Configuration

Parameter

Type

Description

ttsPreferWave

Boolean

Defines the format of the audio file:

  • true: (Default) WAV audio file format (with a WAV header).

  • false: RAW audio file format (without a header).

Note: This parameter is only relevant to AC-TTS-API.