Speech customization
You can customize various speech features, as discussed in below.
Continuous automatic speech recognition (ASR)
By default, the speech-to-text service recognizes the user's end of utterance according to the duration of detected audio silence (or by other means). Each recognized utterance is sent by Live Hub to the bot as a separate textual message.
Sometimes, the detection of end of utterance occurs too quickly and the user is cut off while speaking. For example, when the user replies with a long description that is comprised of several sentences. In such cases, all the utterances should be sent together to the bot as one long textual message.
Continuous automatic speech recognition enables Live Hub to collect all the user's utterances. When it detects silence for a user-defined duration or a configured DTMF key (e.g., the # pound key) is pressed by the user, it concatenates the multiple speech-to-text detected utterances, and then sends them as a single textual message to the bot. In this way, the user can configure a longer silence timeout.
This feature is controlled by the Administrator, but the bot can dynamically control this mode during the conversation.
How to use it?
|
Parameter |
Type |
Description |
|---|---|---|
|
Boolean |
Enables the Continuous ASR feature. Continuous ASR enables Live Hub to concatenate multiple speech-to-text recognitions of the user and then send them as a single textual message to the bot.
|
|
|
String |
This parameter is applicable when the Continuous ASR feature is enabled. Defines a special DTMF key, which if pressed, causes the Live Hub to immediately send the accumulated recognitions of the user to the bot. For example, if configured to "#" and the user presses the pound key (#) on the phone's keypad, the device concatenates the accumulated recognitions and then sends them as one single textual message to the bot. The default is "#". Note: Using this feature incurs an additional delay from the user’s perspective because the speech is not sent immediately to the bot after it has been recognized. To overcome this delay, configure the parameter to a value that is appropriate to your environment. |
|
|
Number |
This parameter is applicable when the Continuous ASR feature is enabled. Defines the automatic speech recognition (ASR) timeout (in milliseconds). The timer is triggered when a recognition is received from the speech-to-text service. When Live Hub detects silence from the user for a duration configured by this parameter, it concatenates all the accumulated speech-to-text recognitions and sends them as one single textual message to the bot. The valid value range is 500 (i.e., 0.5 second) to 60000 (i.e., 1 minute). The default is 3000. Note: The parameter's value must be less than or equal to the value of the |
|
|
Number |
This parameter is applicable when the Continuous ASR feature is enabled. Defines the timeout (in milliseconds) between hypotheses and between the last hypothesis to the recognition. This timer is triggered when a hypothesis is received from the speech-to-text service. When the timer expires, the last utterance from the user is discarded and the previous speech-to-text recognitions are sent to the bot. The valid value range is 500 to 6000. The default is 3000. |
|
|
String |
This parameter specifies a message to be sent to the bot when the user presses the end digit (e.g., #) without saying anything. Default: "<empty>" (a string that literally contains the word "empty" in angle brackets) Behavior: If this parameter is set, the configured text is sent to the bot when the user presses the end digit without any speech input. If the parameter is set to an empty string, no message is sent to the bot in this scenario. Note: The |