Speech Recognition And Synthesis Managed APIs In Windows Vista: Part II


Playing with the speech synthesizer is a lot of fun for about five minutes (ten if you have both Microsoft Anna and Microsoft Lila to work with)  — but after typing “Hello World” into your Speechpad document for the umpteenth time, you may want to do something a bit more challenging.  If you do, then it is time to plug in your expensive microphone, since speech recognition really works best with a good expensive microphone.  If you don’t have one, however, then go ahead and plug in a cheap microphone.  My cheap microphone seems to work fine.  If you don’t have a cheap microphone, either, I have heard that you can take a speaker and plug it into the mic jack of your computer, and if that doesn’t cause an explosion, you can try talking into it.


While speech synthesis may be useful for certain specialized applications, voice commands, by cantrast, are a feature that can be used to enrich any current WinForms application. With the SR Managed API, it is also easy to implement once you understand certain concepts such as the Grammar class and the SpeechRecognitionEngine.


We will begin by declaring a local instance of the speech engine and initializing it. 

	#region Local Members

private SpeechSynthesizer synthesizer = null;
private string selectedVoice = string.Empty;
private SpeechRecognitionEngine recognizer = null;

#endregion

public Main()
{
InitializeComponent();
synthesizer = new SpeechSynthesizer();
LoadSelectVoiceMenu();
recognizer = new SpeechRecognitionEngine();
InitializeSpeechRecognitionEngine();
}

private void InitializeSpeechRecognitionEngine()
{
recognizer.SetInputToDefaultAudioDevice();
Grammar customGrammar = CreateCustomGrammar();
recognizer.UnloadAllGrammars();
recognizer.LoadGrammar(customGrammar);
recognizer.SpeechRecognized +=
new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);
recognizer.SpeechHypothesized +=
new EventHandler<SpeechHypothesizedEventArgs>(recognizer_SpeechHypothesized);
}

private Grammar CreateCustomGrammar()
{
GrammarBuilder grammarBuilder = new GrammarBuilder();
grammarBuilder.Append(new Choices(“cut”, “copy”, “paste”, “delete”));
return new Grammar(grammarBuilder);
}


The speech recognition engine is the main workhorse of the speech recognition functionality.  At one end, we configure the input device that the engine will listen on.  In this case, we use the default device (whatever you have plugged in), though we can also select other inputs, such as specific wave files.  At the other end, we capture two events thrown by our speech recognition engine.  As the engine attempts to interpret the incoming sound stream, it will throw various “hypotheses” about what it thinks is the correct rendering of the speech input.  When it finally determines the correct value, and matches it to a value in the associated grammar objects, it throws a speech recognized event, rather than a speech hypothesized event.  If the determined word or phrase does not have a match in any associated grammar, a speech recognition rejected event (which we do not use in the present project) will be thrown instead.


In between, we set up rules to determine which words and phrases will throw a speech recognized event by configuring a Grammar object and associating it with our instance of the speech recognition engine.  In the sample code above, we configure a very simple rule which states that a speech recognized event will be thrown if any of the following words: “cut“, “copy“, “paste“, and “delete“, is uttered.  Note that we use a GrammarBuilder class to construct our custom grammar, and that the syntax of the GrammarBuilder class closely resembles the syntax of the StringBuilder class.


This is the basic code for enabling voice commands for a WinForms application.  We will now enhance the Speechpad application by adding a menu item to turn speech recognition on and off,  a status bar so we can watch as the speech recognition engine interprets our words, and a function that will determine what action to take if one of our key words is captured by the engine.


Add a new menu item labeled “Speech Recognition” under the “Speech” menu item, below “Read Selected Text” and “Read Document”.  For convenience, name it speechRecognitionMenuItem.  Add a handler to the new menu item, and use the following code to turn speech recognition on and off, as well as toggle the speech recognition menu item.  Besides the RecognizeAsync() method that we use here, it is also possible to start the engine synchronously or, by passing it a RecognizeMode.Single parameter, cause the engine to stop after the first phrase it recognizes. The method we use to stop the engine, RecognizeAsyncStop(), is basically a polite way to stop the engine, since it will wait for the engine to finish any phrases it is currently processing before quitting. An impolite method, RecognizeAsyncCancel(), is also available — to be used in emergency situations, perhaps.

        private void speechRecognitionMenuItem_Click(object sender, EventArgs e)
{
if (this.speechRecognitionMenuItem.Checked)
{
TurnSpeechRecognitionOff();
}
else
{
TurnSpeechRecognitionOn();
}
}

private void TurnSpeechRecognitionOn()
{
recognizer.RecognizeAsync(RecognizeMode.Multiple);
this.speechRecognitionMenuItem.Checked = true;
}

private void TurnSpeechRecognitionOff()
{
if (recognizer != null)
{
recognizer.RecognizeAsyncStop();
this.speechRecognitionMenuItem.Checked = false;
}
}


We are actually going to use the RecognizeAsyncCancel() method now, since there is an emergency situation. The speech synthesizer, it turns out, cannot operate if the speech recognizer is still running. To get around this, we will need to disable the speech recognizer at the last possible moment, and then reactivate it once the synthesizer has completed its tasks. We will modify the ReadAloud() method to handle this.


private void ReadAloud(string speakText)
{
try
{
SetVoice();
recognizer.RecognizeAsyncCancel();
synthesizer.Speak(speakText);
recognizer.RecognizeAsync(RecognizeMode.Multiple);
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}

}

The user now has the ability to turn speech recognition on and off. We can make the application more interesting by capturing the speech hypothesize event and displaying the results to a status bar on the Main form.  Add a StatusStrip control to the Main form, and a ToolStripStatusLabel to the StatusStrip with its Spring property set to true.  For convenience, call this label toolStripStatusLabel1.  Use the following code to handle the speech hypothesized event and display the results:

private void recognizer_SpeechHypothesized(object sender, SpeechHypothesizedEventArgs e)
{
GuessText(e.Result.Text);
}

private void GuessText(string guess)
{
toolStripStatusLabel1.Text = guess;
this.toolStripStatusLabel1.ForeColor = Color.DarkSalmon;
}


Now that we can turn speech recognition on and off, as well as capture misinterpretations of the input stream, it is time to capture the speech recognized event and do something with it.  The SpeechToAction() method will evaluate the recognized text and then call the appropriate method in the child form (these methods are accessible because we scoped them internal in the Textpad code above).  In addition, we display the recognized text in the status bar, just as we did with hypothesized text, but in a different color in order to distinguish the two events.


private void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
string text = e.Result.Text;
SpeechToAction(text);
}

private void SpeechToAction(string text)
{
TextDocument document = ActiveMdiChild as TextDocument;
if (document != null)
{
DetermineText(text);

switch (text)
{
case “cut”:
document.Cut();
break;
case “copy”:
document.Copy();
break;
case “paste”:
document.Paste();
break;
case “delete”:
document.Delete();
break;
}
}
}

private void DetermineText(string text)
{
this.toolStripStatusLabel1.Text = text;
this.toolStripStatusLabel1.ForeColor = Color.SteelBlue;
}


Now let’s take Speechpad for a spin.  Fire up the application and, if it compiles, create a new document.  Type “Hello world.”  So far, so good.  Turn on speech recognition by selecting the Speech Recognition item under the Speech menu.  Highlight “Hello” and say the following phrase into your expensive microphone, inexpensive microphone, or speaker: delete.  Now type “Save the cheerleader, save the”.  Not bad at all.

7 thoughts on “Speech Recognition And Synthesis Managed APIs In Windows Vista: Part II

  1. Appreciate the compelling point of view drafted in Speech Recognition And Synthesis Managed APIs In Windows Vista: Part II. I was considering all these remarks and discovered that many of them are in truth exclusively for the linkups. I guessed about jumping over your blog because of it but determined instead to just join id to just join i Keep the points arriving. As the terminator stated I will be back.

  2. I never afraid to order research papers at the essays help uk service because I understand that only experts are able to assist me!

  3. Hello, nice day.. Your article is quite uplifting. I never believed that it was feasible to accomplish something like that until after I looked over your post.

  4. This is a fantastic guide you have more to publish these kinds of content articles since so dumb, I observed the response to my question. I wish you continued results in creating.

Comments are closed.