Speech Recognition And Synthesis Managed APIs in Windows Vista: Part III

Voice command technology, as exemplified in Part II, is probably the most useful and most easy to implement aspect of the Speech Recognition functionality provided by Vista.  In a few days of work, any current application can be enabled to use it, and the potential for streamlining workflow and making it more efficient is truly breathtaking.  The cool factor, of course, is also very high.

Having grown up watching Star Trek reruns, however, I can’t help but feel that the dictation functionality is much more interesting than the voice command functionality.  Computers are meant to be talked to and told what to do, as in that venerable TV series, not cajoled into doing tricks for us based on finger motions over a typewriter.  My long-term goal is to be able to code by talking into my IDE in order to build UML diagrams and then, at a word, turn that into an application.  What a brave new world that will be.  Toward that end, the SR managed API provides the DictationGrammar class.

Whereas the Grammar class works as a gatekeeper, restricting the phrases that get through to the speech recognized handler down to a select set of rules, the DictateGrammar class, by default, kicks out the jams and lets all phrases through to the recognized handler.

In order to make Speechpad a dictation application, we will add the default DicatateGrammar object to the list of grammars used by our speech recognition engine.  We will also add a toggle menu item to turn dictation on and off.  Finally, we will alter the SpeechToAction() method in order to insert any phrases that are not voice commands into the current Speechpad document as text.  Create an local instance of DictateGrammar for our Main form, and then instantiate it in the Main constructor.  Your code should look like this:

	#region Local Members
		
        private SpeechSynthesizer synthesizer = null;
        private string selectedVoice = string.Empty;
        private SpeechRecognitionEngine recognizer = null;
        private DictationGrammar dictationGrammar = null;
        
        #endregion
        
        public Main()
        {
            InitializeComponent();
            synthesizer = new SpeechSynthesizer();
            LoadSelectVoiceMenu();
            recognizer = new SpeechRecognitionEngine();
            InitializeSpeechRecognitionEngine();
            dictationGrammar = new DictationGrammar();
        }
        

Create a new menu item under the Speech menu and label it “Take Dictation“.  Name it takeDictationMenuItem for convenience. Add a handler for the click event of the new menu item, and stub out TurnDictationOn() and TurnDictationOff() methods.  TurnDictationOn() works by loading the local dictationGrammar object into the speech recognition engine. It also needs to turn speech recognition on if it is currently off, since dictation will not work if the speech recognition engine is disabled. TurnDictationOff() simply removes the local dictationGrammar object from the speech recognition engine’s list of grammars.

		
        private void takeDictationMenuItem_Click(object sender, EventArgs e)
        {
            if (this.takeDictationMenuItem.Checked)
            {
                TurnDictationOff();
            }
            else
            {
                TurnDictationOn();
            }
        }

        private void TurnDictationOn()
        {
            if (!speechRecognitionMenuItem.Checked)
            {
                TurnSpeechRecognitionOn();
            }
            recognizer.LoadGrammar(dictationGrammar);
            takeDictationMenuItem.Checked = true;
        }

        private void TurnDictationOff()
        {
            if (dictationGrammar != null)
            {
                recognizer.UnloadGrammar(dictationGrammar);
            }
            takeDictationMenuItem.Checked = false;
        }
        

For an extra touch of elegance, alter the TurnSpeechRecognitionOff() method by adding a line of code to turndictation off when speech recognition is disabled:

        TurnDictationOff();

Finally, we need to update the SpeechToAction() method so it will insert any text that is not a voice command into the current Speechpad document.  Use the default statement of the switch control block to call the InsertText() method of the current document.

        
        private void SpeechToAction(string text)
        {
            TextDocument document = ActiveMdiChild as TextDocument;
            if (document != null)
            {
                DetermineText(text);
                switch (text)
                {
                    case "cut":
                        document.Cut();
                        break;
                    case "copy":
                        document.Copy();
                        break;
                    case "paste":
                        document.Paste();
                        break;
                    case "delete":
                        document.Delete();
                        break;
                    default:
                        document.InsertText(text);
                        break;
                }
            }
        }

        

With that, we complete the speech recognition functionality for Speechpad. Now try it out. Open a new Speechpad document and type “Hello World.”  Turn on speech recognition.  Select “Hello” and say delete.  Turn on dictation.  Say brave new.

This tutorial has demonstrated the essential code required to use speech synthesis, voice commands, and dictation in your .Net 2.0 Vista applications.  It can serve as the basis for building speech recognition tools that take advantage of default as well as custom grammar rules to build adanced application interfaces.  Besides the strange compatibility issues between Vista and Visual Studio, at the moment the greatest hurdle to using the Vista managed speech recognition API is the remarkable dearth of documentation and samples.  This tutorial is intended to help alleviate that problem by providing a hands on introduction to this fascinating technology.

6 thoughts on “Speech Recognition And Synthesis Managed APIs in Windows Vista: Part III

Comments are closed.