Active listening mode
Active listening mode refers to AI Agent ability to process input while user is speaking instead of waiting for final speech recognition. This can be used to improve agent responsiveness when using standard textual (non-realtime) Large Language Models.
When active listening mode is enabled, AI Agent uses hypotheses from STT engine to generate LLM responses while user speaks. In most cases last hypothesis is very close to the final recognition – thus allowing AI Agent to use pre-generated LLM response and shortening the end-to-end response latency.
Note that active listening mode consumes additional LLM tokens – therefore you are essentially trading tokens for latency.
Active listening mode can be activated by adding the following in you AI Agent’s advanced configuration screen:
{
"active_listening": {
"enabled": true
}
}
In addition to that you may customize active listening mode behavior by configuring additional parameters as described below.
|
Parameter |
Type |
Description |
|---|---|---|
|
|
ActiveListening |
Improve agent responsiveness by generating LLM responses based on STT hypotheses. |
ActiveListening
| Parameter | Type | Description |
|---|---|---|
mode
|
enum |
Active listening mode. Supported values:
|
max_parallel
|
int |
Maximum number of parallel response generations. Default = 3 |
hypothesis_interval_ms
|
int |
Minimum time between hypotheses in milliseconds that trigger response generation. Default = 100 msec |
similarity_threshold
|
float |
Similarity threshold between final recognition and last hypothesis (0.0 to 1.0) Default = 0.9 |