Active listening mode
Active listening mode refers to AI Agent ability to process input while user is speaking instead of waiting for final speech recognition. This can be used to improve agent responsiveness when using standard textual (non-realtime) Large Language Models.
When active listening mode is enabled, AI Agent uses hypotheses from STT engine to generate LLM responses while user speaks. In most cases last hypothesis is very close to the final recognition – thus allowing AI Agent to use pre-generated LLM response and shortening the end-to-end response latency.
Note that active listening mode consumes additional LLM tokens – therefore you are essentially trading tokens for latency.
Active listening mode can be activated by adding the following in you AI Agent’s advanced configuration screen:
{
"active_listening": {
"enabled": true
}
}
In addition to that you may customize active listening mode behavior by configuring additional parameters as described below.
|
Parameter |
Type |
Description |
|---|---|---|
|
|
ActiveListening |
Improve agent responsiveness by generating LLM responses based on STT hypotheses. |
ActiveListening
| Parameter | Type | Description |
|---|---|---|
enabled
|
bool | Enable active listening mode. |
max_parallel
|
int |
Maximum number of parallel response generations. Default = 3 |
hypothesis_interval_ms
|
int |
Minimum time between hypotheses in milliseconds that trigger response generation. Default = 100 msec |
confidence_threshold
|
float |
Confidence threshold for processing hypotheses (0.0 to 1.0). Default = 0.8 |
similarity_threshold
|
float |
Similarity threshold between final recognition and last hypothesis (0.0 to 1.0) Default = 0.9 |
Configuring confidence threshold
Active listening mode takes into consideration confidence level included in STT hypothesis event.
Many STT engines, for example, Azure STT – do not include confidence level in hypothesis events. For them, we implicitly assume the following confidence level:
-
If hypothesis contains end-of-sentence punctuation, for example, period, confidence level is 0.9
-
Otherwise, confidence level is 0.8
Active listening mode uses confidence_threshold parameter to determine whether to trigger LLM response for each STT hypothesis event.
Use the following guidelines to configure optimal confidence_threshold parameter value for your setup:
-
For STT engines that do not include punctuation in hypothesis events, for example, Azure STT, keep
confidence_thresholdparameter at its default (0.8). This will trigger active listening by every hypothesis event. -
For STT engines that include punctuation in hypothesis events, for example, Deepgram Nova 3, change
confidence_thresholdvalue to 0.9. This will trigger active listening only by hypothesis events that include end-of-sentence punctuation.
-
For Deepgram Flux STT engine, that includes explicit “eager end of turn” flag, change
confidence_thresholdvalue to 1.0. Hypotheses with “eager end of turn” flag are assigned confidence level 1.0 and this configuration will trigger active listening only by these event.