In Windows 10 the Microsoft’s voice assistant Cortana was introduce and came as a part of desktop. A small Cortana query box is sitting on the right side of the Start button, ready to take a voice command. In order to launch it, one should always start the phrase with “Hey Cortana” or activate Cortana by clicking on the box to listen.

Cortana voice command

You can enrich your app by extending Cortana using voice commands. An app can be launched in the foreground or background. Besides launching, voice commands can require additional context or user input in order to fulfill and perform some task.

This approach is more suitable if you want to personalize Cortana to “improve it to your assistant”. By integrating extended Cortana functionalities using voice command directly in your app, user can perform action without opening the app at the first place. This can be observed as a shortcut to your app functionalities.

Cortana is built on top of speech recognition API, but instead of using SRGS format for your grammar the developer should use VCD format.

Speech recognition

You can enrich your app by using the Speech Recognition API. An app can accomplish tasks using the speech recognition by converting spoken words into texts and perform actions based on some business logic in behind. It can also capture dictated text/phrases.

This approach is more suitable if you need to perform some actions within the app by using verbal interaction.

Using Speech Recognition in UWP example

The speech recognition classes are located in the Windows.Media.SpeechRecognition namespace. Audio permission classes are located in the Windows.Media.Capture namespace.

The page that will contain voice interaction implemented, will run a speech recognizer in the background. The recognizer will listen for commands defined in the SRGS file and raise events as you speak.

The Speech Recognition Grammar Specification (SRGS) is an industry-standard format for describing the phrases to be recognized. You will define verbal commands using SRGS. In the SRGS example below, the command is to select an item with an id that contains only digits. The length of id is 1-N.

SRGS Grammar

Initialize the recognizer

The speech recognizer may be initialized in the OnNavigateTo event. The following steps are needed to initialize a continuous recognition session using an SRGS grammar:

  • Check for permissions
  • Initialize the SpeechRecognizer object
  • Load and compile the SRGS grammar
  • Listen for the ResultGenerated event, which is raised as words are spoken
  • Start the continuous recognition session

Handle speech results

Your app will receive a continuous stream of results via the ResultGenerated event. The listener for this event must do the following:

  • Determine whether the result is a successful recognition
  • Determine whether the result has sufficient confidence
  • Determine the appropriate response / business logic.

One Comment

Leave A Comment