Speaker verification
VoiceAI Connect can integrate with a speaker verification service to verify and authenticate a person's identity (based on speech samples for the bot). The verification is done using a third-party service.
Each speaker recognition system has two phases:
-
Enrollment - The speaker's voice is recorded and specific voice features are extracted into a voice print.
-
Verification - A speech sample is compared against a previously created voice print.
Speaker verification systems fall into two categories:
-
Text-Dependent - The user is expected to say a specific pre-defined phrase. This requires less time to verify.
-
Text-Independent - The system analyzes free speech from the user. This can be performed passively, without requiring the user to say specific phrases (it can also be language independent).
In a typical bot deployment, VoiceAI Connect receives a phone call and connects it to your bot. The bot requests a speaker ID from the user and either begins the enrollment process if the user's speaker ID is not in the system, or it begins the verification process if the speaker ID is already in the system.
How do I use it?
The following sections explain how to integrate your bot with the speaker verification feature.
For an example on how to implement such a bot, see speaker verification bot examples.
Creating a user's Speaker Profile
Depending on your Speaker Verification service provider, you may need to explicitly create a profile before enrolling a new user in the service. Other services implicitly create the profile during the enrollment process.
At the present time, Azure Speaker Verification requires the explicit creation of the Speaker Profile, and returns a Profile ID for subsequent use with the APIs. The value of this ID is determined by Azure and not by the bot, so typical implementations using Azure Speaker Verification will require the bot to maintain a mapping of customer IDs to Speaker Profile IDs.
To create a voice profile for a speaker and retrieve its ID, the bot should send the “speakerVerificationCreateSpeaker” event. It can specify the speaker verification type in the “speakerVerificationType
” parameter (the default is text-independent).
For example:
{ "type": "event", "name": "speakerVerificationCreateSpeaker", "activityParams": { "speakerVerificationType": "text-dependent" } }
When handling the event, VoiceAI Connect will contact the Microsoft verification service for creating the voice profile and retrieve its ID.
When creating a voice profile for a speaker, Microsoft engine returns a profile ID (in UUID format). The mapping between the speaker identification and its voice profile ID should be kept at the bot side. The bot should send the speaker profile ID in the sessionParams "speakerVerificationSpeakerId" parameter of the events sent to VoiceAI Connect.
The voice profile ID is sent to the bot in the rawResult field of the “speakerVerificationActionResult” event. This value should be used by the bot as the value of “speakerVerificationSpeakerId” in subsequent events.
For example:
{ "type": "event", "name": "speakerVerificationActionResult", "value": { "success": true, "rawResult": { "profileId": "12414fd5-a34a-4eb5-8196-d2b12d657a2a", "profileType": 2 } } }
The following fields will be sent with the event:
Parameter |
Type |
Description |
---|---|---|
|
Boolean |
Indication whether the operation has succeeded. |
|
Object |
The result that was received from the verification service. Note: The value of the field will depend on the verification service. |
Getting user's speaker ID status
After a call is initiated and the bot prompts and creates or receives the user's speaker ID, the bot sends a speakerVerificationGetSpeakerStatus
API command (with the speaker ID) to VoiceAI Connect.
VoiceAI Connect sends the information to the verification service and returns the speaker ID status (enrolled true/false) to the bot.
Example of a speakerVerificationGetSpeakerStatus
event:
{ "type": "event", "name": "speakerVerificationGetSpeakerStatus", "activityParams": { "speakerVerificationSpeakerId": "123456" } }
{ "type": "event", "name": "speakerVerificationGetSpeakerStatus", "channelData": { "activityParams": { "speakerVerificationSpeakerId": "123456" } } }
Add a Custom Payload fulfillment with the following content:
{ "activities": [{ "type": "event", "name": "speakerVerificationGetSpeakerStatus", "activityParams": { "speakerVerificationSpeakerId": "123456" } }] }
Add a Custom Payload response with the following content:
{ "activities": [{ "type": "event", "name": "speakerVerificationGetSpeakerStatus", "activityParams": { "speakerVerificationSpeakerId": "123456" } }] }
The speaker ID status is sent to the bot as the speakerVerificationSpeakerStatus
event.
Example of a speakerVerificationSpeakerStatus
event:
{ "type": "event", "name": "speakerVerificationSpeakerStatus", "value": { "success": true, "enrolled": true, "rawResult": "{...}" } }
{ "type": "event", "name": "speakerVerificationSpeakerStatus", "value": { "success": true, "enrolled": true, "rawResult": "{...}" } }
The fields are sent inside the event-speakerVerificationSpeakerStatus
session parameter, and can be accessed using a syntax such as this:
$session.params.event-speakerVerificationSpeakerStatus.success
{ "queryInput": { "event": { "languageCode": "en-US", "name": "speakerVerificationSpeakerStatus", "parameters": { "success": true, "enrolled": true, "rawResult": "{...}" } } } }
The following fields will be sent with the event:
Parameter |
Type |
Description |
---|---|---|
|
Boolean |
Indication whether the operation has succeeded. |
|
Boolean |
Indication whether the speaker ID is already enrolled in the verification service.
|
|
Object |
The result that was received from the verification service. Note: The value of the field will depend on the verification service. |
|
String |
In case of failure, includes a free text explaining the failure. |
Call initiation flow example
Enrollment
If the speakerVerificationGetSpeakerStatus
command indicates that the user is not enrolled (i.e., user's speaker ID does not exist in the verification system), then the bot can (with user permission) initiate a speaker verification enrollment procedure by sending a speakerVerificationEnroll
API command.
Example of a speakerVerificationEnroll
event:
{ "type": "event", "name": "speakerVerificationEnroll", "activityParams": { "speakerVerificationType": "text-dependent", "speakerVerificationSpeakerId": "123456", "speakerVerificationPhrase": "My voice is my password" } }
{ "type": "event", "name": "speakerVerificationEnroll", "channelData": { "activityParams": { "speakerVerificationType": "text-dependent", "speakerVerificationSpeakerId": "123456", "speakerVerificationPhrase": "My voice is my password" } } }
Add a Custom Payload fulfillment with the following content:
{ "activities": [{ "type": "event", "name": "speakerVerificationEnroll", "activityParams": { "speakerVerificationType": "text-dependent", "speakerVerificationSpeakerId": "123456", "speakerVerificationPhrase": "My voice is my password" } }] }
Add a Custom Payload response with the following content:
{ "activities": [{ "type": "event", "name": "speakerVerificationEnroll", "activityParams": { "speakerVerificationType": "text-dependent", "speakerVerificationSpeakerId": "123456", "speakerVerificationPhrase": "My voice is my password" } }] }
Receiving enrollment progress notifications
When handling the enrollment event, VoiceAI Connect sends the user's audio to the verification service.
If the enrollment requires additional samples, the speakerVerificationEnrollProgress
event will be sent to the bot. This event is especially useful for text-dependent verification, as the bot will need to ask the user to say his passphrase again in such case.
Example of a speakerVerificationEnrollProgress
event:
{ "type": "event", "name": "speakerVerificationEnrollProgress", "value": { "moreAudioRequired": true, "rawResult": "{...}" } }
{ "type": "event", "name": "speakerVerificationEnrollProgress", "value": { "moreAudioRequired": true, "rawResult": "{...}" } }
The fields are sent inside the event-speakerVerificationEnrollProgress
session parameter, and can be accessed using a syntax such as this:
$session.params.event-speakerVerificationEnrollProgress.moreAudioRequired
{ "queryInput": { "event": { "languageCode": "en-US", "name": "speakerVerificationEnrollProgress", "parameters": { "moreAudioRequired": true, "rawResult": "{...}" } } } }
The following fields will be sent with the event:
Parameter |
Type |
Description |
---|---|---|
|
Boolean |
When set to true, indicates that additional utterances are required from the user to complete the enrollment. |
|
Object |
The result that was received from the verification service. Note: The value of the field will depend on the verification service. |
Enrollment Restrictions
When using Azure Speaker Verification service, the user is restricted to the following phrases:
For text-dependent, one of the following phrases must be used for enrollment and subsequent verification:
-
I am going to make him an offer he cannot refuse.
-
Houston we have had a problem.
-
My voice is my passport verify me.
-
Apple juice tastes funny after toothpaste.
-
You can get in without your password.
-
You can activate security system now.
-
My voice is stronger than passwords.
-
My password is not your business.
-
My name is unknown to you.
-
Be yourself everyone else is already taken.
Text-independent verification has no restrictions on what the speaker says during enrollment, besides the initial activation phrase, which must be:
-
I'll talk for a few seconds so you can recognize my voice in the future.
It does not have any restrictions on the audio during the verification stage.
Enrollment completion
When the verification service completes the enrollment, VoiceAI Connect sends the speakerVerificationEnrollCompleted
event to the bot, indicating the result.
Example of a speakerVerificationEnrollCompleted
event:
{ "type": "event", "name": "speakerVerificationEnrollCompleted", "value": { "success": true, "rawResult": "{...}" } }
{ "type": "event", "name": "speakerVerificationEnrollCompleted", "value": { "success": true, "rawResult": "{...}" } }
The fields are sent inside the event-speakerVerificationEnrollCompleted
session parameter, and can be accessed using a syntax such as this:
$session.params.event-speakerVerificationEnrollCompleted.success
{ "queryInput": { "event": { "languageCode": "en-US", "name": "speakerVerificationEnrollCompleted", "parameters": { "success": true, "rawResult": "{...}" } } } }
The following fields will be sent with the event:
Parameter |
Type |
Description |
---|---|---|
|
Boolean |
Indication whether the enrollment operation succeeded. |
|
Object |
The result that was received from the verification service. Note: The value of the field will depend on the verification service. |
|
Array of objects |
The results of the intermediate operations (e.g., of each utterance) prior to the last result. Note: The value of the field will depend on the verification service. |
|
String |
In case of failure,this includes free text explaining the failure. |
Enrollment flow example
Verification
If the speakerVerificationGetSpeakerStatus
command returns a "true" (i.e., user's speaker ID exists in the verification system), then the bot can proceed to initiate a speaker verification procedure by sending a speakerVerificationVerify
API command.
Example of a speakerVerificationVerify
event:
{ "type": "event", "name": "speakerVerificationVerify", "activityParams": { "speakerVerificationType": "text-dependent", "speakerVerificationSpeakerId": "123456", "speakerVerificationPhrase": "My voice is my password" } }
{ "type": "event", "name": "speakerVerificationVerify", "channelData": { "activityParams": { "speakerVerificationType": "text-dependent", "speakerVerificationSpeakerId": "123456", "speakerVerificationPhrase": "My voice is my password" } } }
Add a Custom Payload fulfillment with the following content:
{ "activities": [{ "type": "event", "name": "speakerVerificationVerify", "activityParams": { "speakerVerificationType": "text-dependent", "speakerVerificationSpeakerId": "123456", "speakerVerificationPhrase": "My voice is my password" } }] }
Add a Custom Payload response with the following content:
{ "activities": [{ "type": "event", "name": "speakerVerificationVerify", "activityParams": { "speakerVerificationType": "text-dependent", "speakerVerificationSpeakerId": "123456", "speakerVerificationPhrase": "My voice is my password" } }] }
VoiceAI Connect starts the verification operation by sending the user's audio to the verification service.
Receiving verification progress notifications
When working in text-independent mode, usually several utterances of the user would be required for the verification progress.
In such case, after processing each intermediate utterance of the user, the speakerVerificationVerifyProgress
event will be sent to the bot.
Example of a speakerVerificationVerifyProgress
event:
{ "type": "event", "name": "speakerVerificationVerifyProgress", "value": { "moreAudioRequired": true, "rawResult": "{...}" } }
{ "type": "event", "name": "speakerVerificationVerifyProgress", "value": { "moreAudioRequired": true, "rawResult": "{...}" } }
The fields are sent inside the event-speakerVerificationVerifyProgress
session parameter, and can be accessed using a syntax such as this:
$session.params.event-speakerVerificationVerifyProgress.moreAudioRequired
{ "queryInput": { "event": { "languageCode": "en-US", "name": "speakerVerificationVerifyProgress", "parameters": { "moreAudioRequired": true, "rawResult": "{...}" } } } }
The following fields will be sent with the event:
Parameter |
Type |
Description |
---|---|---|
|
Boolean |
When set to true, indicates that additional utterances are required from the user to complete the enrollment. |
|
Object |
The result that was received from the verification service. Note: The value of the field will depend on the verification service. |
Verification completion
When the verification service is finished, VoiceAI Connect sends the speakerVerificationVerifyCompleted
event to the bot, indicating the result.
If there is not enough audio to match a voice print, the VoiceAI Connect sends the speakerVerificationVerifyCompleted
event with a"success" value = false to the bot.
Example of a speakerVerificationVerifyCompleted
event:
{ "type": "event", "name": "speakerVerificationVerifyCompleted", "value": { "success": true, "verified": "yes", "rawResult": "{...}" } }
{ "type": "event", "name": "speakerVerificationVerifyCompleted", "value": { "success": true, "verified": "yes", "rawResult": "{...}" } }
The fields are sent inside the event-speakerVerificationVerifyCompleted
session parameter, and can be accessed using a syntax such as this:
$session.params.event-speakerVerificationVerifyCompleted.verified
{ "queryInput": { "event": { "languageCode": "en-US", "name": "speakerVerificationVerifyCompleted", "parameters": { "success": true, "verified": "yes", "rawResult": "{...}" } } } }
The following fields will be sent with the event:
Parameter |
Type |
Description |
---|---|---|
|
Boolean |
Indication whether the verification operation succeeded. |
|
String |
Indicates the result of the verification. Possible values:
This field is only sent if the operation succeeded. |
|
Object |
The result that was received from the verification service. Note: The value of the field will depend on the verification service. |
|
Array of objects |
The results of the intermediate operations (e.g., of each utterance) prior to the last result. Note: The value of the field will depend on the verification service. |
|
String |
In case of failure, includes a free text explaining the failure. |
Verification flow example
Unenrollment
There are cases where you want to remove a speaker from the verification service (e.g., the speaker needs to be re-enrolled, or the speaker no longer consents to have their voice print in the system).
To remove a speaker from the service, the bot sends the speakerVerificationDeleteSpeaker
event, indicating the user's speaker ID in the speakerVerificationSpeakerId
parameter.
Example of a speakerVerificationDeleteSpeaker
event:
{ "type": "event", "name": "speakerVerificationDeleteSpeaker", "activityParams": { "speakerVerificationType": "text-dependent", "speakerVerificationSpeakerId": "123456", "speakerVerificationPhrase": "My voice is my password" } }
{ "type": "event", "name": "speakerVerificationDeleteSpeaker", "channelData": { "activityParams": { "speakerVerificationSpeakerId": "123456" } } }
Add a Custom Payload fulfillment with the following content:
{ "activities": [{ "type": "event", "name": "speakerVerificationDeleteSpeaker", "activityParams": { "speakerVerificationSpeakerId": "123456" } }] }
Add a Custom Payload response with the following content:
{ "activities": [{ "type": "event", "name": "speakerVerificationDeleteSpeaker", "activityParams": { "speakerVerificationSpeakerId": "123456" } }] }
When handling the event, VoiceAI Connect will contact the verification service to delete the specified speaker ID.
Upon completion of the operation, Voice AI Connect sends the SpeakerVerificationActionResult
event to the bot.
Example of a SpeakerVerificationActionResult
event:
{ "type": "event", "name": "SpeakerVerificationActionResult", "value": { "success": true, "rawResult": "{...}" } }
{ "type": "event", "name": "SpeakerVerificationActionResult", "value": { "success": true, "rawResult": "{...}" } }
The fields are sent inside the event-SpeakerVerificationActionResult
session parameter, and can be accessed using a syntax such as this:
$session.params.event-SpeakerVerificationActionResult.success
{ "queryInput": { "event": { "languageCode": "en-US", "name": "SpeakerVerificationActionResult", "parameters": { "success": true, "rawResult": "{...}" } } } }
The following fields will be sent with the event:
Parameter |
Type |
Description |
---|---|---|
|
Boolean |
Indication whether the operation has succeeded. |
|
Object |
The result that was received from the verification service. Note: The value of the field will depend on the verification service. |
|
String |
In case of failure, includes a free text explaining the failure. |
Configuration
Administrative configuration
The following bot configuration parameters are configured by the VoiceAI Connect Administrator:
Parameter |
Type |
Description |
---|---|---|
String |
References a service provider used to perform the speaker verification. The value of this parameter should match the |
|
String (optional) |
Defines a string that is prefixed to the speakerVerificationSpeakerId value, when used with the verification service to ensure ID is unique. This parameter can be used if the same verification service instance is used for distinct customers, whose speakers should be differentiated. |
The following provider configuration parameters are configured by the VoiceAI Connect Administrator:
Parameter |
Type |
Description |
---|---|---|
String |
Defines the URL of the verification service. The default value for Nuance Gatekeeper: gatekeeper.api.nuance.com |
|
String |
The URL of the authentication service. The default value for Nuance Gatekeeper: https://auth.crt.nuance.com/oauth2/token |
The following provider parameters are required for the "credentials" section of the provider (for Nuance Gatekeeper):
Parameter |
Type |
Description |
---|---|---|
String |
Defines the username for authentication with the verification service. |
|
String |
Defines the password for authentication with the verification service. |
The following parameters are required for the "credentials" section of the provider (for Phonexia):
Parameter |
Type |
Description |
---|---|---|
String |
Defines the username for authentication with the verification service. |
|
String |
Defines the password for authentication with the verification service. |
Example of Nuance Gatekeeper provider configuration:
{ "name": "my verify provider", "type": "nuance-grpc", "credentials": { "oauthClientId": "my ClientId", "oauthClientSecret": "my ClientSecret" } }
Example of Nuance Gatekeeper bot configuration:
{ "name": "my bot", "displayName": "My Bot", "provider": "bot provider", "speakerVerificationProvider": "my verify provider", "speakerVerificationTenantScope": "my scope name", "speakerVerificationConfigSet": "text tependent configset", "speakerVerificationType": "text-dependent", "sendEventsToBot": [ "speakerVerificationSpeakerStatus", "speakerVerificationActionResult", "speakerVerificationEnrollProgress", "speakerVerificationVerifyProgress", "speakerVerificationEnrollCompleted", "speakerVerificationVerifyCompleted" ] }
Configuring your bot
The following configuration parameters can be configured by the VoiceAI Connect Administrator, or dynamically by the bot during the conversation (bot overrides VoiceAI Connect configuration):
Parameter |
Type |
Description |
---|---|---|
String |
The name of the scope given by Nuance for the tenant. Note: This parameter is only applicable to Nuance. |
|
String |
One of "text-dependent" or "text-independent". |
|
String |
Defines the name of the "configuration set" used for verification by the speaker verification provider. Note: This parameter is only applicable to Nuance and should correspond to the speaker verification type. |
|
String |
The Speaker ID. Can be set using placeholders. |
|
String |
(optional) For text-dependent operation type, the phrase used for the voice signature (if required by the verification service). |
|
Number |
The maximum number of utterances to send to verification service for an enroll operation. If the operation is not complete and the number of utterances exceeds this value, the operation is canceled. Valid range: 1-100 Default for text-dependent: 5 Default for text-independent: 20 |
|
Number |
The maximum number of utterances to send to verification service for a verify operation. If the operation is not complete and the number of utterances exceeds this value, the operation is canceled. Valid range: 1-100. Default for text-dependent: 1 Default for text-independent: 20 |
|
Array of strings |
For receiving notification events, the events names should be specified in this parameter. The following values can be specified:
|
|
String |
The combination of Speaker Identifier and Voiceprint tag identifies a voice signature of a specific user. It is set during the enrollment phase and later used during the verification phase to identify the user that needs to be verified. The default is "VP" This combination is also used in the get speaker status event. Note: This parameter is only applicable to Nuance. |
|
Boolean |
An indication whether to delete only the Speaker verification Voiceprint tag. This parameter is optional and relevant only for speaker verification provider Nuance-grpc.
Note: This parameter is only applicable to Nuance. |
|
Number |
The timeout for Speaker Verification operations. Valid range: 100-60000 (ms) Default value: 20000 (20 seconds) This parameter is applicable only to VoiceAI Connect Enterprise Version 3.18 and later.
|
Example of Nuance retrieving speaker status configuration:
{ "type": "event", "name": "speakerVerificationGetSpeakerStatus", "activityParams": { "speakerVerificationSpeakerId": "123456", "speakerVerificationVoiceprintTag": "VP_TD" } }
Example of Nuance starting enrollment configuration:
{ "type": "event", "name": "speakerVerificationEnroll", "activityParams": { "speakerVerificationType": "text-dependent", "speakerVerificationSpeakerId": "some-id-123456", "speakerVerificationVoiceprintTag": "VP_TD" } }
Example of Nuance starting verification configuration:
{ "type": "event", "name": "speakerVerificationVerify", "activityParams": { "speakerVerificationType": "text-dependent", "speakerVerificationSpeakerId": "some-id-123456", "speakerVerificationVoiceprintTag": "VP_TD" } }
Example of Nuance deleting Voiceprint profile configuration:
For deleting a speaker’s voiceprint, the bot should send the speakerVerificationDeleteSpeaker
event, indicating the Speaker Identifier in the speakerVerificationSpeakerId
parameter, the speaker’s voiceprint tag to be deleted in the speakerVerificationVoiceprintTag
parameter and set the speakerVerificationDeleteOnlyVoiceprint
parameter to true.
{ "type": "event", "name": "speakerVerificationDeleteSpeaker", "activityParams": { "speakerVerificationSpeakerId": "123456", "speakerVerificationVoiceprintTag": "VP_TD", "speakerVerificationDeleteOnlyVoiceprint": true } }
Limitations
This feature uses the speech-to-text service for detection of end-of-speech.
For this reason, there are two limitations during the enrollment and verification process:
-
Speech-to-text must be enabled.
-
Barge-in must remain disabled..