Multi-language setup

Most of the LLMs can interact with users in multiple languages. For example, the following prompt works just fine with gpt-4o-mini model and allows users to communicate with it in English or Hebrew languages:

You are a helpful assistant providing user with information on plants and animals.
User may ask you questions in either English or Hebrew. Respond in the same language in which the question was asked.

The prompt will behave perfectly fine if you chat with it in text modality, via the Chat button on the Agent card. However, voice conversations won’t work smoothly because of the specific (single) language configured in corresponding bot connection.

To overcome this problem, use automatic language recognition feature of the Live Hub platform, as described in https://techdocs.audiocodes.com/voice-ai-connect/#VAIG_Combined/speech-customization.htm.

To enable the feature, you need configure your bot connection to use Microsoft speech-to-text provider, and specify primary language and voice name, for example:

And then add the following in Advanced configuration tab:

{
  "languageDetectionActivate": true,
  "alternativeLanguages": [
    {
      "language": "en-US",
      "voiceName": "en-US-AshleyNeural"
    }
  ],
  "languageDetectionMode": "continuous",
  "languageDetectionAutoSwitch": true
}

Change language and voiceName, in the example above, to match your secondary language and voice name. You can specify multiple languages if needed. Make sure to use valid voice name, supported by your text-to-speech provider.

Automatic language recognition works, by default, in the beginning of the call (for the first 5 seconds). This can be altered by adding the following parameter to the above configuration:

"languageDetectionMode": "continuous",

When one of alternative language is detected, speech-to-text engine is switched accordingly. This ensures that response from the agent will be played correctly and the conversation will proceed as expected.

Note that the language detection / switching happens only once. Therefore, it is not suitable for conversations where users constantly switch languages between the utterances.