Text-to-Speech API

A third-party text-to-speech vendor (except the commonly used ones such as Azure, AWS, and Google) can integrate with an API that VoiceAI Connect Enterprise exposes as a text-to-speech client. The client (VoiceAI Connect Enterprise) sends an HTTP POST request to a pre-defined URL.

An Authorization header is sent by the client in the HTTP request, containing a shared token. The token can be used by the text-to-speech server to identify the client, for example:

Authorization: Bearer <token>

Request Body Attributes

Parameter	Type	Description
`language`	String	Defines the BCP-47 language code for speech recognition of the supplied audio.
`format`	String	Defines the format of the audio file (configured by the `ttsPreferWave` parameter): `raw`: Audio without headers `wav`: Audio with WAV headers
`encoding`	String	Defines how the audio is stored and transmitted. Currently, only 16-bit linear pulse-code modulation (PCM) encoding (`LINEAR16`) is supported.
`sampleRateHz`	Number	Defines the sample rate (in Hertz) of the supplied audio. Currently, only 16,000 Hz is supported.
`voice`	String	Defines the name of the voice used for speech synthesis.
`type`	String	Defines the type of text. If it contains SSML, the type is set to `ssml`.
`text`	String	Defines the text to synthesize.

Response Body Attributes

In case of a success, the text-to-speech server replies with a 200 OK response, containing a body with the synthesized speech. In case of failure, the server replies with an HTTP error code.

Example

Example 1:

{
  "language": "en-US",
  "format": "wav",
  "encoding": "LINEAR16",
  "sampleRateHz": 16000,
  "voice": "SomeVoiceName",
  "text": "Text to be played"
}

Example 2:

{
  "language": "en-US",
  "format": "wav",
  "encoding": "LINEAR16",
  "sampleRateHz": 16000,
  "voice": "SomeVoiceName",
  "type": "ssml",
  "text": "<speak><say-as interpret-as=\"ordinal\">1</say-as></speak>"
}

Configuration

Parameter	Type	Description
ttsPreferWave	Boolean	Defines the format of the audio file: `true`: (Default) WAV audio file format (with a WAV header). `false`: RAW audio file format (without a header). Note: This parameter is only relevant to AC-TTS-API.

Parameter

Type

Description

ttsPreferWave

Boolean

Defines the format of the audio file:

true: (Default) WAV audio file format (with a WAV header).
false: RAW audio file format (without a header).

Note: This parameter is only relevant to AC-TTS-API.