Text-to-Speech API
A third-party text-to-speech vendor (except the commonly used ones such as Azure, AWS, and Google) can integrate with an API that VoiceAI Connect Enterprise exposes as a text-to-speech client. The client (VoiceAI Connect Enterprise) sends an HTTP POST request to a pre-defined URL.
An Authorization header is sent by the client in the HTTP request, containing a shared token. The token can be used by the text-to-speech server to identify the client, for example:
Authorization: Bearer <token>
Request Body Attributes
Parameter |
Type |
Description |
---|---|---|
|
String |
Defines the BCP-47 language code for speech recognition of the supplied audio. |
|
String |
Defines the format of the audio file (configured by the
|
|
String |
Defines how the audio is stored and transmitted. Currently, only 16-bit linear pulse-code modulation (PCM) encoding ( |
|
Number |
Defines the sample rate (in Hertz) of the supplied audio. Currently, only 16,000 Hz is supported. |
|
String |
Defines the name of the voice used for speech synthesis. |
|
String |
Defines the type of text. If it contains SSML, the type is set to |
|
String |
Defines the text to synthesize. |
Response Body Attributes
In case of a success, the text-to-speech server replies with a 200 OK response, containing a body with the synthesized speech. In case of failure, the server replies with an HTTP error code.
Example
Example 1:
{ "language": "en-US", "format": "wav", "encoding": "LINEAR16", "sampleRateHz": 16000, "voice": "SomeVoiceName", "text": "Text to be played" }
Example 2:
{ "language": "en-US", "format": "wav", "encoding": "LINEAR16", "sampleRateHz": 16000, "voice": "SomeVoiceName", "type": "ssml", "text": "<speak><say-as interpret-as=\"ordinal\">1</say-as></speak>" }
Configuration
Parameter |
Type |
Description |
---|---|---|
Boolean |
Defines the format of the audio file:
Note: This parameter is only relevant to AC-TTS-API. |