Text-to-Speech API

A third-party TTS vendor (excluding the commonly used ones such as Azure, AWS, and Google) can integrate with an API that the VoiceAI Connect exposes as a TTS client. The client (VoiceAI Connect) sends an HTTP POST request towards a pre-defined URL.

An Authorization header is sent by the client on the HTTP request, containing a shared token. The token can be used by the TTS server to identify the client. Example:

Authorization: Bearer <token> 
Request Body Attributes

Parameter

Type

Description

language

String

Defines the BCP-47 language code for speech recognition of the supplied audio.

format

String

Defines the format of the audio file (as configured by the ttsPreferWave parameter):

  • raw: Audio without headers

  • wav: Audio with WAV headers

encoding

String

Defines the manner in which the audio is stored and transmitted. Currently, only 16-bit linear pulse-code modulation (PCM) encoding (LINEAR16) is supported.

sampleRateHz

Number

Defines the sample rate (in Hertz) of the supplied audio. Currently, only 16,000 Hz is supported.

voice

String

Defines the name of the voice used for speech synthesis.

type

String

Defines the type of text. If it contains SSML, the type is set to ssml.

text

String

Defines the text to synthesize.

Response Body Attributes

In case of a success, the TTS server replies with a 200 OK response, containing a body with the synthesized speech. In case of failure, the server replies with an HTTP error code.

Example

Example 1:

{
  "language": "en-US",
  "format": "wav",
  "encoding": "LINEAR16",
  "sampleRateHz": 16000,
  "voice": "SomeVoiceName",
  "text": "Text to be played"
}

Example 2:

{
  "language": "en-US",
  "format": "wav",
  "encoding": "LINEAR16",
  "sampleRateHz": 16000,
  "voice": "SomeVoiceName",
  "type": "ssml",
  "text": "<speak><say-as interpret-as=\"ordinal\">1</say-as></speak>"
}

Configuration

Parameter

Type

Description

ttsPreferWave

Boolean

Defines the format of the audio file:

  • true: (Default) WAV audio file format (with a WAV header).

  • false: RAW audio file format (without a header).

Note: This parameter is only relevant to AC-TTS-API.