Multi-language setup

Most of the LLMs can interact with users in multiple languages. For example, the following prompt works seamlessly with gpt-4o-mini model, allowing users to communicate in either English or Spanish:

You are a helpful assistant providing user with information on plants and animals.
User may ask you questions in either English or Spanish. Respond in the same language in which the question was asked.

The prompt works perfectly fine in text conversation, for example, when chatting via the Chat button on the Agent card. However, voice interactions may not function smoothly because the corresponding bot connection is configured to use a single language for speech input and output.

Use one of the following methods to enable your agent to communicate with user in multiple languages during voice conversations:

Use speech-to-text service for language detection

Some speech-to-text services – for example, Microsoft STT – have the ability to detect the spoken language. This ability can be used to perform automatic language / voice switching. Language recognition for speech to text (Microsoft) in Speech customization for a detailed feature description.

To enable the feature, you need configure your agent to use Microsoft speech-to-text provider, and specify primary language and voice name, for example:

After that navigate to Bot connections screen, locate bot connection connected to your agent, click Edit and add the following in its Advanced configuration tab:

{
  "languageDetectionActivate": true,
  "alternativeLanguages": [
    {
      "language": "es-US",
      "voiceName": "es-US-AshleyNeural"
    }
  ],
  "languageDetectionMode": "continuous",
  "languageDetectionAutoSwitch": true
}

The above configuration contains the following elements:

Change language and voiceName, in the example above, to match your secondary language and voice name. You can specify multiple languages if needed. Make sure to use valid voice name, supported by your text-to-speech provider.

When one of the alternative languages is detected, the text-to-speech service is switched accordingly. This ensures that response from the agent will be played correctly and the conversation will proceed as expected.

Note that the language detection / switching happens only upon the first utterance. Therefore, it is not suitable for conversations where users constantly switch languages between the utterances.

Use LLM for language detection

Language detection by speech-to-text engine may not work reliably enough in certain scenarios, for example:

To overcome this limitation you may use LLM for language detection and multi-agent topology to update the conversation language.

Main agent starts the conversation. Once it detects the spoken language, it passes the conversation to English or Spanish agent accordingly. The latter updates the conversation’s language and voice via the corresponding session parameters.

To create the setup:

  1. Create a multi-agent topology as shown in the figure above.

    1. Configure ‘Orchestration mode’ in Main agent as delegate.

    2. Add pass_question tool to Main agent

  2. Set the Main agent’s prompt to something like this:

    You are friendly assistant handling voice conversation with user.
     
    Your task is to detect the spoken Language and pass the user question to the corresponding agent:
    
    
    language | agent
    -------- | ------------
    Spanish  | spanish-agent
    English  | english-agent
    
  3. Configure the Main agent (or corresponding bot connection) to use Microsoft speech-to-text provider, and specify primary language and voice name, for example:

    • Text-to-speech service: Microsoft

    • Speech-to-text service: Microsoft

    • Language: en-US

    • Voice name: en-US-BrianNeural

  4. Navigate to Bot connections screen, locate bot connection that is connected to the Main agent, click Edit and add the following to its Advanced configuration tab:

    {
      "languageDetectionActivate": true,
      "alternativeLanguages": [
         {
           "language": "es-US",
           "voiceName": "es-US-AlonsoNeural"
         }
      ]
    }
    
  5. Navigate back to the Agents screen, and configure the following Advanced configuration parameters for Spanish agent:

    {
      "sessionParams": {
        "language": "es-US",
        "voiceName": "es-US-AlonsoNeural"
       }
    }
    

    Change language and voiceName, in the examples above, to match your secondary language and voice name. Specify valid voice name supported by your text-to-speech provider.

Use DTMF for language switching

For scenarios where language detection cannot be reliably done either by speech-to-text service or by LLM, you may use DTMF dial tones for language switching.

Create multi-agent topology like the one described in the previous chapter. Skip configuration of languageDetectionActivate and alternativeLanguages in bot connection’s advanced configuration tab.

Then make the following changes:

Use real-time models

Realtime models support multiple languages and automatically switch to the language spoken by user. In most cases you will still need to specify list of supported languages in the prompt to prevent model from switching to unsupported language.