The Imaginative Universal

Studies in Virtual Phenomenology -- @jamesashley

Jabberwocky

April 02
by James Ashley 2. April 2007 18:50

 

Download SAPISophiaDemo.zip - 2,867.5 KB

 

Following on the tail of the project I have been working on for the past month, a chatterbox (also called a chatbot) with speech recognition and text-to-speech functionality, I came across the following excerpted article in The Economist, available here if you happen to be a subscriber, and here if you are not:

 

Chatbots have already been used by some companies to provide customer support online via typed conversations. Their understanding of natural language is somewhat limited, but they can answer basic queries. Mr Carpenter wants to combine the flexibility of chatbots with the voice-driven "interactive voice-response" systems used in many call centres to create a chatbot that can hold spoken conversations with callers, at least within a limited field of expertise such as car insurance.

This is an ambitious goal, but Mr Carpenter has the right credentials: he is the winner of the two most recent Loebner prizes, awarded in an annual competition in which human judges try to distinguish between other humans and chatbots in a series of typed conversations. His chatbot, called Jabberwacky, has been trained by analysing over 10m typed conversations held online with visitors to its website (see jabberwacky.com). But for a chatbot to pass itself off as a human agent, more than ten times this number of conversations will be needed, says Mr Carpenter. And where better to get a large volume of conversations to analyse than from a call centre?

Mr Carpenter is now working with a large Japanese call-centre company to develop a chatbot operator. Initially he is using transcripts of conversations to train his software, but once it is able to handle queries reliably, he plans to add speech-recognition and speech-synthesis systems to handle the input and output. Since call-centre conversations tend to be about very specific subjects, this is a far less daunting task than creating a system able to hold arbitrary conversations.

 

Jabberwacky is a slightly different beast than the AIML infrastructure I used in my project.  Jabberwacky is a heuristics based technology, whereas AIML is a design-based one that requires somebody to actually anticipate user interactions and try to script them.

All the same, it is a pleasant experience to find that one is serendipidously au courant, when one's intent was to be merely affably retro.

Tags: ,

Software | Speech Recognition

SophiaBot: What I've been working on for the past month...

April 02
by James Ashley 2. April 2007 16:03

I have been busy in my basement constructing a robot with which I can have conversations and play games.  Except that the robot is more of a program, and I didn't build the whole thing up from scratch, but instead cobbled together pieces that other people have created.  I took an Eliza-style interpreter written by Nicholas H.Tollervey (this is the conversation part) along with some scripted dialogs by Dr. Richard S. Wallace and threw it together with a Z-machine program written by Jason Follas, which allows my bot to play old Infocom games like Zork and The Hitchhiker's Guide to the Galaxy.  I then wrapped these up in a simple workflow and added some new Vista\.NET 3.0 speech recognition and speech synthesis code so the robot can understand me.

I wrote an article about it for CodeProject, a very nice resource that allows developers from around the world to share their code and network.  The site requires registration to download code however, so if you want to play with the demo or look at the source code, you can also download them from this site.

Mr. Tollervey has a succint article about the relationship between chatterboxes and John Searle's Chinese Box problem, which obviates me from responsibility for discussing the same.

Instead, I'll just add some quick instructions:

 

The application is made up of a text output screen, a text entry field, and a default enter button. The initial look and feel is that of an IBX XT theme (the first computer I ever played on). This can be changed using voice commands, which I will cover later. There are three menus initially available. The File menu allows the user to save a log of the conversation as a text file. The Select Voice menu allows the user to select from any of the synthetic voices installed on her machine. Vista initially comes with "Anna". Windows XP comes with "Sam". Other XP voices are available depending on which versions of Office have been installed over the lifetime of that particular instance of the OS. If the user is running Vista, then the Speech menu will allow him to toggle speech synthesis, dictation, and the context-free grammars. By doing so, the user will have the ability to speak to the application, as well as have the application speak back to him. If the user is running XP, then only speech synthesis is available, since some of the features provided by .NET 3.0 and consumed by this application do not work on XP.

The appearance menu will let you change the look and feel of the text screen.  I've also added some pre-made themes at the bottom of the appearnce menu.  If, after chatting with SophiaBot for a while, you want to play a game, just type or say "Play game."  SophiaBot will present you with a list of the games available (you can add more, actually, simply by dropping additional game files you find on the internet into the Program Files\Imaginative Universal\SophiaBot\Game Data\DATA folder (Jason's Z-Machine implementation plays games that use version 3 and below of the game engine.  I'm looking (rather lazily) into how to support later versions.  You can go here to download more Zork-type games.  During a game, type or say "Quit" to end your session. "Save" and "Restore" keep track of your current position in the game, so you can come back later and pick up where you left off.

Speech recognition in Vista has two modes: dictation and context-free recognition. Dictation uses context, that is, an analysis of preceding words and words following a given target of speech recognition, in order to determine what word was intended by the speaker. Context-free speech recognition, by way of contrast, uses exact matches and some simple patterns in order to determine if certain words or phrases have been uttered. This makes context-free recognition particularly suited to command and control scenarios, while dictation is particularly suited to situations where we are simply attempting to translate the user's utterances into text.

You should begin by trying to start up a conversation with Sophia using the textbox, just to see how it works, as well as her limitations as a conversationalist. Sophia uses certain tricks to appear more lifelike. She throws out random typos, for one thing. She also is a bit slower than a computer should really be. This is because one of the things that distinguish computers from people is the way they process information -- computers do it quickly, and people do it at a more leisurely pace. By typing slowly, Sophia helps the user maintain his suspension of disbelief. Finally, if a text-to-speech engine is installed on your computer, Sophia reads along as she types out her responses. I'm not certain why this is effective, but it is how computer terminals are shown to communicate in the movies, and it seems to work well here, also. I will go over how this illusion is created below.

In Command\AIML\Game Lexicon mode, the application generates several grammar rules that help direct speech recognition toward certain expected results. Be forewarned: initially loading the AIML grammars takes about two minutes, and occurs in the background. You can continue to touch type conversations with Sophia until the speech recognition engine has finished loading the grammars and speech recognition is available. Using the command grammar, the user can make the computer do the following things: LIST COLORS, LIST GAMES, LIST FONTS, CHANGE FONT TO..., CHANGE FONT COLOR TO..., CHANGE BACKGROUND COLOR TO.... Besides the IBM XT color scheme, a black papyrus font on a linen background also looks very nice. To see a complete list of keywords used by the text-adventure game you have chosen, say "LIST GAME KEYWORDS." When the game is initially selected, a new set of rules is created based on different two word combinations of the keywords recognized by the game, in order to help speech recognition by narrowing down the total number of phrases it must look for.

In dictation mode, the underlying speech engine simply converts your speech into words and has the core SophiaBot code process it in the same manner that it processes text that is typed in. Dictation mode is sometimes better than context-free mode for non-game speech recognition, depending on how well the speech recognition engine installed on your OS has been trained to understand your speech patterns. Context-free mode is typically better for game mode. Command and control only works in context-free mode.

Do Computers Read Electric Books?

March 01
by James Ashley 1. March 2007 16:46

In the comments section of a blog I like to frequent, I have been pointed to an article in the International Herald about Pierre Bayard's new book,  How to Talk About Books You Haven't Read.

Bayard recommends strategies such as abstractly praising the book, offering silent empathy regarding someone else's love for the book, discussing other books related to the book in question, and finally simply talking about oneself.  Additionally, one can usually glean enough information from reviews, book jackets and gossip to sustain the discussion for quite a while.

Students, he noted from experience, are skilled at opining about books they have not read, building on elements he may have provided them in a lecture. This approach can also work in the more exposed arena of social gatherings: the book's cover, reviews and other public reaction to it, gossip about the author and even the ongoing conversation can all provide food for sounding informed.

I've recently been looking through some AI experiments built on language scripts, based on the 1966 software program Eliza, which used a small script of canned questions to maintain a conversation with computer users.  You can play a web version of Eliza here, if you wish.  It should be pointed out that the principles behind Eliza are the same as those that underpin the famous Turing Test.  Turing proposed answering the question can machines think by staging an ongoing experiment to see if machines can imitate thinking.  The proposal was made in his 1950 paper Computing Machinery and Intelligence:

The new form of the problem can be described in terms of a game which we call the 'imitation game." It is played with three people, a man (A), a woman (B), and an interrogator (C) who may be of either sex. The interrogator stays in a room apart front the other two. The object of the game for the interrogator is to determine which of the other two is the man and which is the woman. He knows them by labels X and Y, and at the end of the game he says either "X is A and Y is B" or "X is B and Y is A." The interrogator is allowed to put questions to A and B thus:

C: Will X please tell me the length of his or her hair?

Now suppose X is actually A, then A must answer. It is A's object in the game to try and cause C to make the wrong identification. His answer might therefore be:

"My hair is shingled, and the longest strands are about nine inches long."

In order that tones of voice may not help the interrogator the answers should be written, or better still, typewritten. The ideal arrangement is to have a teleprinter communicating between the two rooms. Alternatively the question and answers can be repeated by an intermediary. The object of the game for the third player (B) is to help the interrogator. The best strategy for her is probably to give truthful answers. She can add such things as "I am the woman, don't listen to him!" to her answers, but it will avail nothing as the man can make similar remarks.

We now ask the question, "What will happen when a machine takes the part of A in this game?" Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman? These questions replace our original, "Can machines think?"

The standard form of the current Turing experiments is something called a chatterbox application.  Chatterboxes abstract the mechanism for generating dialog from the dialog scripts themselves by utilizing a set of rules written in a common format.  The most popular format happens to be an XML standard called AIML (Artificial Intelligence Markup Language).

What I'm interested in, at the moment, is not so much whether I can write a script that will fool people into thinking they are talking with a real person, but rather whether I can write a script that makes small talk by discussing the latest book.  If I can do this, it should validate Pierre Bayard's proposal, if not Alan Turing's.

Tags: , ,

Programming | Speech Recognition | Virtual Reality