Augmented Reality – Page 3 – The Imaginative Universal

Projecting Augmented Reality Worlds

November 7, 2014 James Ashley

In my last post, I discussed the incredible work being done with augmented reality by Magic Leap. This week I want to talk about implementing augmented reality with projection rather than with glasses.

To be more accurate, varieties of AR experiences are often projection based. The technical differences depend on which surface is being projected on. Google glass projects on a surface centimeters from the eye. Magic Leap is reported to project directly on the retina (virtual retinal display technology).

AR experiences being developed at Microsoft Research, which I had the pleasure of visiting this past week during the MVP Summit, are projected onto pre-existing rooms without the need to rearrange the room itself. Using fairly common projection mapping techniques combined with very cool technology such as the Kinect and Kinect v2, the room is scanned and appropriate distortions are created to make projected objects look “correct” to the observer.

An important thing to bear in mind as you look through the AR examples below is that they are not built using esoteric research technology. These experiences are all built using consumer-grade projectors, Kinect sensors and Unity 3D. If you are focused and have a sufficiently strong desire to create magic, these experiences are within your reach.

The most recent work created by this group (led by Andy Wilson and Hrvoje Benko) is a special version of RoomAlive they created for Halloween called The Other Resident. Just to prove I was actually there, here are some pictures of the lab along with the Kinect MVPs amazed that we were being allowed to film everything given that most of the MVP Summit involves NDA content we are not allowed to repeat or comment on.

IllumiRoom is a precursor to the more recent RoomAlive project. The basic concept is to extend the visual experience on the gaming display or television with extended content that responds dynamically to what is seen onscreen. If you think it looks cool in the video, please know that it is even cooler in person. And if you like it and want it in your living room, then comment on this thread or on the youtube video itself to let them know it is definitely an M viable product for the XBox One, as the big catz say.

The RoomAlive experience is the crown jewel at the moment, however. RoomAlive uses multiple projectors and Kinect sensors to scan a room and then use it as a projection surface for interactive, procedural games: in other words, augmented reality.

A fascinating aspect of the RoomAlive experience is how it handles appearance preserving point-of-view dependent visualizations: the way objects need to be distorted in order to appear correct to the observer. In the Halloween experience at the top, you’ll notice that the animation of the old crone looks like it is positioned in front of the chair she is sitting on even the the projection surface is actually partially extended in front of the chair back and at the same time extended several feet behind the chair back for the shoulders and head. In the RoomAlive video just above you’ll see the view dependent visualization distortion occurring with the running soldier changing planes at about 2:32”.

You would think that these appearance preserving PDV techniques will fall apart anytime you have more than one person in the room. To address this problem, Hrvoje and Andy worked on another project that plays with perception and physical interactions to integrate two overlapping experiences in a Wizard Battle scenario called Mano-a-Mano or, more technically, Dyadic Projected Spatial Augmented Reality. The globe at visualization at 2:46” is particularly impressive.

My head is actually still spinning following these demos and I’m still in a bit of a fugue state. I’ve had the opportunity to see lots of cool 3D modeling, scanning, virtual experiences, and augmented reality experiences over the past several years and felt like I was on top of it, but what MSR is doing took me by surprise, especially when it was laid out sequentially as it was for us. A tenth of the work they have been doing over the past two years could easily be the seed of an idea for any number of tech startups.

In the middle of the demos, I leaned over to one of the other MVPs and whispered in his ear that I felt like Steve Jobs at Xerox PARC seeing the graphical user interface and mouse for the first time. He just stroked his beard and nodded. It was a magic moment.

Why Magic Leap is Important

October 27, 2014 James Ashley

magic-leap-shark-640x426

This past weekend a neighbor invited our entire subdivision to celebrate an Indian holiday called Diwali with them – The Festival of Lights. Like many traditions that immigrant families carry to the New World in their luggage, it had become an amalgamation of old and new. The hosts and other Indians from the neighborhood wore traditional South-East Asian formalwear. I was painfully underdressed in an old oxford, chinos and flip-flops. Others came in the formalwear of their native countries. Some just put on jackets and ties. We organized this Diwali as a pot-luck and had an interesting mix of biryanis, spaghetti, enchiladas, pancakes with syrup, borscht, tomato korma, Vietnamese spring rolls and puri.

The most important part of the celebration was the lighting of fireworks. For about two solid hour, children ran through a smoky cul-de-sac waving sparklers while firecrackers went off around them. Towards the end of this celebration, one of our hosts pulled out her iPhone in order to Facetime with her father in India and show him the children playing in the background just as they would have back home, forming a line of continuity between continents using a 1500 year old ritual and an international cellular system. Diwali is called the Festival of Lights, according to Wikipedia, because it celebrates the spiritual victory of light over darkness and ignorance.

When I got home I did some quick calculations. In order to get to that Apple moment our host had with her father – we no longer have Hallmark moments but only Apple moments today – took approximately seven years. This is the amount of time it takes for a technology to seem fantastic and impractical – because we don’t believe it can be done and can’t imagine how we would use it in everyday life if it was – to having it be unexceptional.

Video conferencing has been a staple of science fiction for ages, from 2001: A Space Odyssey to Star Trek. It was only in 2010, however, that Apple announced the FaceTime app making it generally available to anyone who could afford an iPhone. I’m basing the seven years from fantasy to facticity, though, on length of time since the initial release of the iPhone in 2007.

Magic Leap, the digital reality technology that has just received half a billion dollars of funding from companies like Google, is important because it points the way to what can happen in the next seven years. I will paint a picture for you of what a world with this kind of digital reality technology will look like and it’s perfectly okay if you feel it is too out there. In fact, if you end up thinking what I’m describing is plausible, then I haven’t done a good enough job of portraying that future.

Magic Leap is creating a wearable product which may or may not be called Dragonstone glasses and which may or may not be a combination of light field technology – like that used in the Lytro camera – and depth detection – like the Kinect sensor. They are very secretive about what they are doing exactly. When Leap Magic CEO Rony Abovitz talks about his product, however, he uses code to indicate what it is and what it isn’t.

In an interview with David Lidsky, Abovitz let slip that Dragonstone is “not holography, it’s not stereoscopic 3-D. You don’t need a giant robot to hold it over your head, you don’t need to be at home to use it. It’s not made from off-the-shelf parts. It’s not a cellphone in a View-Master.” At first reading, this seems like a quick swipe at Oculus Rift, the non-mobile, stereoscopic virtual reality solution built from consumer parts by Oculus VR and, secondarily, Samsung Gear VR, the mobile add-on to Samsung’s Galaxy Note 4 that turns it into a virtual reality device with stereoscopic audio. Dig a little deeper, however, and it’s apparent that his grand sweep of dismissal takes in a long list of digital reality plays over the years.

Let’s start with holography. Actually, let’s start with a very specific hologram.

let the wookie win

The 1977 holographic chess game from Star Wars is the precursor to both virtual and augmented reality as we think of them – for convenience, I am including them all under the “digital reality” rubric. No child saw this and didn’t want it. From George Lucas imaginative leap, we already see an essential aspect of the digital experience we crave that differentiates it from the actual technology we have. Actual holography involves a frame that we view the virtual image through. In Lucas’s vision, however, the holograms take up space and have a location.

harryhausen

What’s intriguing about the Star Wars scene is that as a piece of film magic, the technology behind the chess game wasn’t particularly innovative. It’s pretty much just the same claymation techniques Ray Harryhausen and others had been using since the 50’s and involves superimposing a animated scene over a live scene. The difference comes in how George Lucas incorporates it into the story. Whereas all the earlier films that mixed live and animated sequences sought to create the illusion that the monsters were real, in the battle chess scene, it is clear that they are not – for instance because they are semi-transparent. Because the elements of the chess game are explicitly not real within the movie narrative – unlike Wookies, Hutts, and Ton-tons – they are suddenly much more interesting. They are something we can potentially recreate.

The difference between virtual reality and augmented reality is similarly one of context. Which is which depends on how we, as the observer, are related to the digital experience. In the case of augmented reality, the context is the real world into which digital objects are inserted. An example of this occurs in Empire Strikes Back [1980], where the binoculars on Hoth provide additional information presented as an overlay on the real world.

The popular conception of virtual reality, as opposed to the technical accomplishment, probably dates to the publication of William Gibson’s Neuromancer in 1984. Gibson’s “cyberspace” is a fully digital immersive world. Unlike augmented reality where the context is our reality, in cyberspace the context is a digital space into which we, as observers and participants, are superimposed.

titan

To schematize the difference, in augmented reality, reality is the background and digital content is in the foreground; in virtual reality, the background that we perceive is digital while the foreground is a combination of digital and actual objects. I find this to be a clean way of distinguishing the two and preferable to the tendency to distinguish them based on different degrees of immersion. To the extent that contemporary VR is built around improving the video game experience, we see that POV games have, as a goal, to create increasingly realistic world – but what is more realistic than the real world. On the other side, augmented reality, when done right, have the potential to be incredibly immersive.

We can subdivide augmented reality even further. We’ll actually need to in order to elucidate why AR in Magic Leap is different from AR in Google Glass. Overlaying digital content on top of reality can take several forms and tends to fall along two axes. An AR experience is either POV or non-POV. It can also be either informational or interactive.

terminator_view

Augmented Reality in the POV-Informatics quadrant is often called Terminator Vision after the 1984 sci-fi Austrian body-builder augmented film. I’m not sure why a computer, the Terminator, would need a display to present data to itself, but in terms of the narrative it does wonders for the audience. It gives a completely false sense of what it must be like to think like a computer.

google glass

Experiences in the non-POV-Informatics quadrant are typically called Heads-Up-Displays or HUD. They have their source in military applications but are probably best known from first-person-shooters where the view-point is tied to objects like windshields or gun-sights rather than to the point-of-view of the player. They also don’t take up the entire view and consequently we can look away from them – unlike Terminator Vision. Google Glass is actually an example of a HUD – though it is sometimes mistaken for TV — since the display only fills up the right corner of the visual field.

fiducial

Non-POV interactive can be either magic mirror experiences or hand-held games and advertisements involving fiducials. This is a common way of creating augmented reality experiences for the iPad and smartphones. The device camera is pointed toward a fiducial, such as a picture in a catalog, and a 3-D model is layered over the video returned by the camera. Interestingly Qualcomm, one of the backers in Magic Leaps recent round of funding, is also a leader in developing tools for this type of AR experience.

hope

POV interactive, the final quadrant, is where Magic Leap falls. I don’t need to describe it because its exemplar is the sort of experience that Rony Abovitz says Dragonstone is not – the hologram from Star Wars. The difference is that where Abovitz is referring to the sort of holography we can do in actual reality, Magic Leap’s technology is the kind of holography that, so far, we have only been able to do in the movies.

If you examine the two images I’ve included from Star Wars IV, you’ll notice that the holograms are seen not from a single point of view but from multiple points of view. This is a feature of persistent augmented reality. The digital AR objects virtually exist in a real-world location and exist that way for multiple people. Even though Luke and Ben have different photons shooting at their eyes displaying the image of Leia from different perspectives, they are nevertheless looking at the same virtual Princess.

This kind of persistence, and the sort of additional technology required to make it work, helps to explain part of the reason Google is interested in it. Google, as we know, already has its own augmented reality play. Where Google brings something new to a POV interactive AR experience is in its expertise in geolocation, without which persistent AR entities would be much harder to create.

This sort of AR experience does not necessarily imply the use of glasses. We don’t know what sort of pseudo-technology is used the the Star Wars universe, but there are indications that it is some sort of projection. In Vernor Vinge’s sci-fi novel Rainbow’s End [2006], persistent augmented reality is projected on microscopic filaments that people experience without wearables.

Because Magic Leap is creating the experience inside a wearable close-range display, i.e. glasses, additional tricks are required. In addition to geolocation – which is only a guess at this point – it will also require some sort of depth sensor to determine if real-world objects are located between the viewer and the object’s location. If there is, then the occlusion of the virtual entity has to be simulated in the visualization – basically, a chunk has to be cut out of the image.

magic-leap-whale

If I have described the Magic Leap technology correctly – and there’s a good chance I have not given the secretiveness around it – then what we are looking at seven years out is a world in which everything we see is constantly being photoshopped in real-time. At a basic level, this fulfills the Magic Leap promise to re-enchant the world with digital entities and also makes sense of their promotional materials.

There are also some interesting side-effects. For one, an augmented world would effectively turn everything and everyone into a potential billboard. Given Google’s participation, this seems even likely. As with the web, advertisements will pay for the content that populates an augmented reality world. Like the web and mobile devices, the same geolocation that makes targeted content possible may also be used to track our behavior.

magic

There are additional social consequences. Many strange aspects of online behavior may make its way into our world. Pseudo-anonymity, which can encourage bad behavior in good people, can become a larger aspect of our world. Instead of appearing as themselves, people may prefer enhanced versions of themselves or even avatars.

jedi_council

In seven years, it may become normal to sit across a conference table from a giant rabbit and Master Chief discussing business strategies. Constant self-reinvention, which is a hallmark of the online experience, may become even more prevalent. In turn, reputation systems may also become more common as a way to curb the problems associated with anonymity. Liking someone I pass in the street may become much more literal.

Jedi

There is also, however, the cool stuff. Technology, despite all the frequent articles to the contrary, has the power to bring people together. Imagine one day being able to share an indigenous festival with loved ones who live thousands of miles away. My eleven year-old daughter has grown up with friends from around the world whom she has met online. Technology allows her not only to chat with them with texts, but also to speak with them while she is performing chores or walking around the house. Yet she has never met any of them. In seven years, we may live in a world where physical distance no longer implies emotional distance and where sitting around chatting face-to-face with someone you have never actually met face-to-face does not seem at all strange.

For me, Magic Leap points to a future where physical limitations are no longer limitations in reality.

How Will Battlestar Galactica End?

March 20, 2009 James Ashley

apollo

I’m such a loser. Hours away from the BSG finale and I am still blogging about code.

So how will Battlestar Galactica end tonight? I am hoping for a classic Dallas ending. Apollo wakes up on the 1978 Galactica from a nightmare in which he is a sociopath politician named Tom Zarek on an alternate Galactica in which Dr. Z is never found and Earth is a wasteland. It turns out that this is part of an evil plot devised by Count Iblis, but Starbuck, Apollo, Boomer, Lieutenant Zack and Muffit, Boxey’s Daggit, foil Iblis’s scheme and bring sanity back to the Galactica. When Apollo tells Starbuck about their intimate relationship in his dream, it creates some awkwardness, but they work it out in a game of pyrimid and then go down to the gambling planet to get good and sloshed.

Can’t wait ’till tonight when I get to see how close I am. It’s definitely a Crystal and Courvoisier kind of evening.

Rock Star

December 30, 2007 James Ashley1 Comment

bobdylan

Over the past week my family and I have been playing with an early Christmas present, an XBOX game called Rock Band. This game one-ups the other popular rock simulation franchise, Guitar Hero, by allowing the player to perform on guitar, drums or with a mic. A sort of guitar hero meets karaoke, it even allows different players to work together in forming a rock band either over the Internet or in their living room, side-by-side.

I have adopted the guitar as my own instrument for acquiring fame. I have already mastered Don’t Fear the Reaper (no cowbell, unfortunately), Wanted Dead or Alive, Mississippi Queen and Blitzkrieg Bop, at medium difficulty, and am currently rehearsing Suffragette City, which has a wicked A-X-Y riff (XBOX control buttons, naturally, not notes) that I cannot seem to get the hang of.

The game is wonderful, though as I play through I am flabbergasted by the number of songs I do not recognize. Who are Vagiant, or Anarchy Club, or Crooked X? There are other groups I know by name, like Weezer, Radiohead, and Foo Fighters, but until now I couldn’t have named a song by any of them. My knowledge of popular music seems to have ended sometime in the late eighties, and there is a decade and a half lacuna following that which I am loathe to fill. Added to this the great number of metal anthems in this game combined with my ignorance of the Bon Jovi catalogue and the Metallica repertoire (though I do know — who wouldn’t — Rush’s Tom Sawyer and Black Sabbath’s Paranoid), and you may get a sense of the cultural irrelevance that washes over me as each new playlist is thrown onto my screen.

Worse, I believe I threw out my back during the extra points phase of Nine Inch Nails’ The Hand That Feeds, and now rest supine, forbidden to play until my back heals or eventually snaps back into place. How do aging rockers do it? Keith Richards apparently has fresh blood transfused into his system every few years, but I believe his case is anomalous.

Playing at being a rock star does have that element of grasping at one’s lost youth to it. Like the elixir imbibed by Richards, rock rhythm and tonal inflections are absorbed by the air-guitarist and, for a brief time, he undergoes a spiritual transmutation — one that must be accomplished in privacy, of course, lest reality, or perchance an unfortunately placed mirror, dispel the glamour.

Much of the current literature — often found in blogs, of course — discusses video games and virtual worlds as a sort of escapism. The notion is that the unfulfilled middle-manager may find an emotional outlet for his work-induced frustrations in virtual games such as World of Warcraft, where everyone and anyone has the opportunity to be a hero.

I am not sure that this quite captures the phenomenon, however. With Rock Band, the goal is not so much to escape one’s reality but rather to participate in a different one — more of a pull than a push, so to speak. Plotinus, though in a different context, spoke of it as a hypostatic union; a union between oneself and a more perfect version of oneself; something that has the character of fulfilling one’s nature rather than erasing it.

Such is my feeling when playing air-guitar on the XBOX. I don’t want to be someone else, but instead simply want to release an aspect of myself that requires lowering the bar a bit, through technology, and closing the gap between myself and the gods of rock and roll. Surely this is the origin of all technology since Nimrod’s tower, brought down by Jehovah for its impertinence. Technology lifts us up, giving us health and the promise of immortality through medicine, cleverness through information technology, courage through online role-playing, and happiness through pharmacology. Technology removes our frailties, leaving us as we were meant to be before the fall: young and immortal.

Speaking of growing old, I attended a Bob Dylan concert a few months ago. Bob has been going through a bit of a revival, his U.S. tour coinciding with a documentary by Martin Scorsese and even a feature length biopic. I have not been able to find an adequate way to write about the event, which I and my companions walked out of. To say that Dylan couldn’t sing seems to be missing the point, since this charge has always been leveled against him. And to say that I did not like the new style he was playing also makes me sound like those who criticized Bob for going electric back in the day.

I might draw the contrast between Bob and Elvis Costello, who opened for him. Elvis played several songs solo, both old and new, only changing guitars on occasion to fit the piece. He was the Elvis I imagined, comfortable on stage, able to work the audience, singing in his off-key way to perfection. He was the archetype of the solo college musician, fitting complex lyrics into a structured form meant to invoke interlaced feelings of sympathy and alienation — the alienation of the artist — in his audience. Like a great beast, the audience responded to his coaxing, and he guided us through a peculiar journey and then deposited us gently when his set was over.

According to Jung, archetypes recur across cultures, and we seek them out, attempting to fill the recesses established in our minds. Most of all we seek heroes, not because we seek to be heroes ourselves, necessarily, but because we have an inchoate sense that there must always be heroes. Shakespeare provided heroes, sometimes twisted, sometimes broken, but heroes nonetheless, on the stage. The modern rock star transforms himself to fulfill this role established for him. The form required for the hero changes over time, and changes with context; Dylan was always able to adapt to these changing roles. He served as a cipher, reflecting the image that people required of him.

The Scorsese documentary, built around a ten hour interview with Dylan, portrays a man who sees none of the depth in his own lyrics that others impute to them. Perhaps he is simply being playful. He discusses a radio interview in which the host asks Dylan whether — actually, insists that — A Hard Rain’s a-Gonna Fall is about nuclear ash and the madness of nuclear proliferation, to which Dylan responds that the song is about rain, which is sometimes heavy, you know.

Again and again, however, Dylan was able to capture the spirit of his times, several times, and people found in his lyrics answers to their questions and anxieties. My favorite Dylan album is Desire, which is a combination of electric protest songs and wild fantasies, all within a cowboy motif. I can’t say what questions he answers for me in this, but I do know that I return to it again and again, and it puts me in a happy place.

According to Giambattista Vico, the 18th century Neapolitan philologist, the secret of metaphor is that it is not based on similarity, but rather on identity. The secret of the hero is not that he makes himself resemble the classic hero. He becomes that hero, transforming himself as needed, internally, until identity is achieved. “The true war chief … is the Godfrey that Torquato Tasso imagines; and all the chiefs who do not conform throughout to Godfrey are not true chiefs of war” as Vico said.

Dylan accomplished something similar, adopting the argot of protest singers like Pete Seeger when this was required of him. As the times changed, he transformed himself again into the rebellious Bob, with dark glasses and an attitude, indifferent to the complaints about his going electric and “selling out,” when in fact his ear was simply better attuned than that of those around him. The Desire period marked him as a reclusive genius, when that was the thing to be. In the 80’s and 90’s, he identified himself with the Christian revival, while his music turned bluesy in a time when we wanted heroes whose rough voices where inhabited by old wisdom and hard-earned experience.

The new Dylan (which is your favorite Dylan?) doesn’t speak to me, however. He has a sort of be-bop band in conservative zoot-suits accompanying him, and the music matches the look. The music is pleasant enough, punctuated by Bob’s gravelly voice hammering out quick phrases like “dondenktwysanizalri” or “juzlikawama.” But what it means, and who Bob Dylan represents, is unclear to me. I hold out the possibility that he is simply ahead of the curve once again — but what sort of culture requires a hero who is mostly pleasant and barely comprehensible? Is this the zeitgeist of the 00’s?

Like my back, my plastic XBOX Stratocaster has also given out. After only a few hours of playing, the strummer is now mushy, and I am unable to get through the rapid 12 note riffs that seem to infest all the metal songs currently on my playlist like little roaches. I have gone online and found a fix that requires me to remove the back of the guitar (there are 20 odd screws, and I am grateful for the gift of a Christmas past: a power drill with bit attachments) and than reset a tension bar inside the guitar mechanism, but this seems to only work for about five hours before the strummer becomes mushy again. Apparently there is a known problem with the early Rock Band Stratocasters, and Activision is allowing people to send their bad guitars in for a replacement. Since I am currently in a state of rock star disability, I think I may take advantage of this, and with luck by the time my guitar is healed, my back will be, too.

SophiaBot: What I’ve been working on for the past month…

April 2, 2007 James Ashley11 Comments

I have been busy in my basement constructing a robot with which I can have conversations and play games. Except that the robot is more of a program, and I didn’t build the whole thing up from scratch, but instead cobbled together pieces that other people have created. I took an Eliza-style interpreter written by Nicholas H.Tollervey (this is the conversation part) along with some scripted dialogs by Dr. Richard S. Wallace and threw it together with a Z-machine program written by Jason Follas, which allows my bot to play old Infocom games like Zork and The Hitchhiker’s Guide to the Galaxy. I then wrapped these up in a simple workflow and added some new Vista\.NET 3.0 speech recognition and speech synthesis code so the robot can understand me.

I wrote an article about it for CodeProject, a very nice resource that allows developers from around the world to share their code and network. The site requires registration to download code however, so if you want to play with the demo or look at the source code, you can also download them from this site.

Mr. Tollervey has a succint article about the relationship between chatterboxes and John Searle’s Chinese Box problem, which obviates me from responsibility for discussing the same.

Instead, I’ll just add some quick instructions:

The application is made up of a text output screen, a text entry field, and a default enter button. The initial look and feel is that of an IBX XT theme (the first computer I ever played on). This can be changed using voice commands, which I will cover later. There are three menus initially available. The File menu allows the user to save a log of the conversation as a text file. The Select Voice menu allows the user to select from any of the synthetic voices installed on her machine. Vista initially comes with “Anna”. Windows XP comes with “Sam”. Other XP voices are available depending on which versions of Office have been installed over the lifetime of that particular instance of the OS. If the user is running Vista, then the Speech menu will allow him to toggle speech synthesis, dictation, and the context-free grammars. By doing so, the user will have the ability to speak to the application, as well as have the application speak back to him. If the user is running XP, then only speech synthesis is available, since some of the features provided by .NET 3.0 and consumed by this application do not work on XP.

The appearance menu will let you change the look and feel of the text screen. I’ve also added some pre-made themes at the bottom of the appearnce menu. If, after chatting with SophiaBot for a while, you want to play a game, just type or say “Play game.” SophiaBot will present you with a list of the games available (you can add more, actually, simply by dropping additional game files you find on the internet into the Program Files\Imaginative Universal\SophiaBot\Game Data\DATA folder (Jason’s Z-Machine implementation plays games that use version 3 and below of the game engine. I’m looking (rather lazily) into how to support later versions. You can go here to download more Zork-type games. During a game, type or say “Quit” to end your session. “Save” and “Restore” keep track of your current position in the game, so you can come back later and pick up where you left off.

Speech recognition in Vista has two modes: dictation and context-free recognition. Dictation uses context, that is, an analysis of preceding words and words following a given target of speech recognition, in order to determine what word was intended by the speaker. Context-free speech recognition, by way of contrast, uses exact matches and some simple patterns in order to determine if certain words or phrases have been uttered. This makes context-free recognition particularly suited to command and control scenarios, while dictation is particularly suited to situations where we are simply attempting to translate the user’s utterances into text.

You should begin by trying to start up a conversation with Sophia using the textbox, just to see how it works, as well as her limitations as a conversationalist. Sophia uses certain tricks to appear more lifelike. She throws out random typos, for one thing. She also is a bit slower than a computer should really be. This is because one of the things that distinguish computers from people is the way they process information — computers do it quickly, and people do it at a more leisurely pace. By typing slowly, Sophia helps the user maintain his suspension of disbelief. Finally, if a text-to-speech engine is installed on your computer, Sophia reads along as she types out her responses. I’m not certain why this is effective, but it is how computer terminals are shown to communicate in the movies, and it seems to work well here, also. I will go over how this illusion is created below.

In Command\AIML\Game Lexicon mode, the application generates several grammar rules that help direct speech recognition toward certain expected results. Be forewarned: initially loading the AIML grammars takes about two minutes, and occurs in the background. You can continue to touch type conversations with Sophia until the speech recognition engine has finished loading the grammars and speech recognition is available. Using the command grammar, the user can make the computer do the following things: LIST COLORS, LIST GAMES, LIST FONTS, CHANGE FONT TO…, CHANGE FONT COLOR TO…, CHANGE BACKGROUND COLOR TO…. Besides the IBM XT color scheme, a black papyrus font on a linen background also looks very nice. To see a complete list of keywords used by the text-adventure game you have chosen, say “LIST GAME KEYWORDS.” When the game is initially selected, a new set of rules is created based on different two word combinations of the keywords recognized by the game, in order to help speech recognition by narrowing down the total number of phrases it must look for.

In dictation mode, the underlying speech engine simply converts your speech into words and has the core SophiaBot code process it in the same manner that it processes text that is typed in. Dictation mode is sometimes better than context-free mode for non-game speech recognition, depending on how well the speech recognition engine installed on your OS has been trained to understand your speech patterns. Context-free mode is typically better for game mode. Command and control only works in context-free mode.

48% of Americans Reject Darwinian Evolution

April 1, 2007 James Ashley1 Comment

A new Newsweek poll reveals frightening data about the curious disjunct between faith and science among Americans. Pundits have attributed these results to anything from poor science education in pre-K programs to global warming. According to the poll, while 51% percent of Americans still ascribe to Darwin’s theory of gradual evolution through adaptation, an amazing 42% continue to cleave to Lamarkianism, while only 6% believe in Punctuated Equilibrium. 1% remain uncommitted and are waiting to hear more before they come to a final decision.

This has led me to wonder what else Americans believe:

The 2002 Roper Poll found that 48% of americans believe in UFO’s, while 37% believe that there has been first hand contact between aliens and humans. 25% of Americans believe in alien abductions, while approximately 33% believe that humans are the only intelligent life in the universe, and that all the UFO stuff is bunk.

The 33% of people who ascribe to the anthropocentric view of the universe corresponds numerically with the 33% of Americans who opposed the recent deadline for troop withdrawal from Iraq (PEW Research center poll). According to the Gallup poll, in 1996 33% of Americans thought they would become rich someday. By 2003, this number had dropped to 31%. According to a Scripps Howard/Ohio University poll, 33% of the American public suspects that federal officials assisted in the 9/11 terrorist attacks or took no action to stop them so the United States could go to war in the Middle East. A Harris poll discovered that in 2004, 33% of adult Americans considered themselves Democrats.

PEW says that as of 2004, 33 million American internet users had reviewed or rated something as part of an online rating system. 33 million Americans were living in povery in 2001, according to the U.S. Census Bureau. According to PEW, in 2006 33 million Americans had heard of VOIP. Each year, 33 million Americans use mental health services or services to treat their problems and illnesses resluting from alcohol, inappropirate use of prescription medications, or illegal drugs. The New York Times says that out of 33 countries, Americans are least likely to believe in evolution. Researchers estimate that 33% of Americans born in 2000 will develop diabetes. In the same year, 33 million Americans lost their jobs.

CBS pollsters discovered that 22% of Americans have seen or felt the presence of a ghost. 48% believe in ghosts. ICR says 48% of Americans oppose embryonic stem-cell research. CBS finds that 61% support embryonic stem-cell research. There is no poll data available on whether they believe that embryos used for stem-cell research will one day become ghosts themselves.

82% of Americans believe that global warming is occuring, according to Fox News/Opinion Dynamics. 79% believe people’s behavior has contributed to global warming. 89% do not believe the U.S. government staged or faked the Apollo moon landing, according to Gallup. Gallup also found that 41% of Americans believe in ESP, 25% believe in Astrology, 20% believe in reincarnation, while only 9% believe in channeling. A USA TODAY/ABC News/Stanford University Medical Center poll found that 5% of American adults have turned to acupuncture for pain relief.

According to Gallup, 44% of Americans go out of their way to see movies starring Tom Hanks. 34% go out of their way to avoid movies starring Tom Cruise. Only 18% go out of their way to avoid Angelina Jolie.

Do Computers Read Electric Books?

March 1, 2007 James Ashley11 Comments

In the comments section of a blog I like to frequent, I have been pointed to an article in the International Herald about Pierre Bayard’s new book, How to Talk About Books You Haven’t Read.

Bayard recommends strategies such as abstractly praising the book, offering silent empathy regarding someone else’s love for the book, discussing other books related to the book in question, and finally simply talking about oneself. Additionally, one can usually glean enough information from reviews, book jackets and gossip to sustain the discussion for quite a while.

Students, he noted from experience, are skilled at opining about books they have not read, building on elements he may have provided them in a lecture. This approach can also work in the more exposed arena of social gatherings: the book’s cover, reviews and other public reaction to it, gossip about the author and even the ongoing conversation can all provide food for sounding informed.

I’ve recently been looking through some AI experiments built on language scripts, based on the 1966 software program Eliza, which used a small script of canned questions to maintain a conversation with computer users. You can play a web version of Eliza here, if you wish. It should be pointed out that the principles behind Eliza are the same as those that underpin the famous Turing Test. Turing proposed answering the question can machines think by staging an ongoing experiment to see if machines can imitate thinking. The proposal was made in his 1950 paper Computing Machinery and Intelligence:

The new form of the problem can be described in terms of a game which we call the ‘imitation game.” It is played with three people, a man (A), a woman (B), and an interrogator (C) who may be of either sex. The interrogator stays in a room apart front the other two. The object of the game for the interrogator is to determine which of the other two is the man and which is the woman. He knows them by labels X and Y, and at the end of the game he says either “X is A and Y is B” or “X is B and Y is A.” The interrogator is allowed to put questions to A and B thus:

C: Will X please tell me the length of his or her hair?

Now suppose X is actually A, then A must answer. It is A’s object in the game to try and cause C to make the wrong identification. His answer might therefore be:

“My hair is shingled, and the longest strands are about nine inches long.”

In order that tones of voice may not help the interrogator the answers should be written, or better still, typewritten. The ideal arrangement is to have a teleprinter communicating between the two rooms. Alternatively the question and answers can be repeated by an intermediary. The object of the game for the third player (B) is to help the interrogator. The best strategy for her is probably to give truthful answers. She can add such things as “I am the woman, don’t listen to him!” to her answers, but it will avail nothing as the man can make similar remarks.

We now ask the question, “What will happen when a machine takes the part of A in this game?” Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman? These questions replace our original, “Can machines think?”

The standard form of the current Turing experiments is something called a chatterbox application. Chatterboxes abstract the mechanism for generating dialog from the dialog scripts themselves by utilizing a set of rules written in a common format. The most popular format happens to be an XML standard called AIML (Artificial Intelligence Markup Language).

What I’m interested in, at the moment, is not so much whether I can write a script that will fool people into thinking they are talking with a real person, but rather whether I can write a script that makes small talk by discussing the latest book. If I can do this, it should validate Pierre Bayard’s proposal, if not Alan Turing’s.

Concerning Ladders

February 6, 2007 James Ashley2 Comments

It is a commonplace that humor resists translation. This was Pevear and Volokhonsky’s conceit when they came out with a new translation of The Brothers Karamazov in 1990, which they claimed finally brought across (successfully, I think) the deep humor of Dostoevsky’s masterpiece. While accuracy is the goal in most translation efforts, to hold to accuracy when translating humor unavoidably leaves much untranslated. Thus in translating Lewis Caroll into Russian, Vladimir Nabokov chose to replace English wordplay with completely different puns sensible only to the Russian speaker, all the better to capture the flavor of Caroll’s humor.

A former colleague at the infamous Turner Studios has posted the following joke to his blog, which I must admit I cannot decipher:

Yet I know it is a joke, because he adds the following gloss to the image:

Let the hilarity ensue. Someone put up a site where you can build your own World of Warcraft talent trees. The priest one made me laugh, since it’s twoo, it’s twoo.

What is one required to know in order to decipher this particular joke? Initially, of course, one must know that this is an artifact of the online game World of Warcraft, which is a complex virtual world people pay a monthly fee in order to gain access to. Next, the artifact is a “talent tree”, which describes different abilities people can gain through accruing time in the virtual world. The various talents form a tree in the sense that one must gain lower level talents before one may achieve the higher talents, and while the low level talents form a broad base, there are fewer high level talents to choose from when one gets to the top. The choice of talents one chooses to acquire, in turn, determines what sort of person one is in the virtual World of Warcraft.

This is the formal aspect of the talent tree. In order to understand the hilarity of this particular talent tree, however, one must further understand the pictorial vocabulary used to represent talents in this tree, a task requiring a Rosetta stone of sorts. The use of pictures to tell stories is old, and certainly predates any written languages, with exemplars such as the cave drawings at Lascaux. Long after the advent of written languages, images continued to exert a central role in the telling of stories and the transmission of culture in societies where the majority of people were illiterate. It was even the main way that Christianity promulgated its teachings to the masses, and the eventual eclipse of the central role of images in religious life by way of the Protestant Reformation can be seen as a direct result of the emphasis placed on reading the Bible for oneself, and hence the importance of literacy.

Beyond pratfalls and scatology, I’m not sure that pictures without words are a particularly effective means of transmitting humor. The talent tree for the priest represented above has less to do with cave paintings at Lascaux than with the Renaissance emblem book tradition, which does attempt to treat images as language, and reached its height of artistic expression with the HYPNEROTOMACHIA POLIPHILI. The traditional emblem book was made up of a series of 100 or so images that were explicated with poems and allegories. What sets them apart from instructive religious images is that they require a high level of literacy in order to read and enjoy, whereas religious images during the same period were particularly useful for the illiterate. In some cases, due to the expense of printing woodcuts, emblem books would even forgo actual images and instead would include mere descriptions of the emblems being explicated. Implicit in all of this, however, was the understanding that whatever could be said about the emblems was originally and overabundantly expressed in the images themselves, and that the accompanying text merely offered a glimpse into their hidden meanings.

Athanasius Kircher, the 17th century polymath, pursued a similar approach toward deciphering Egyptian hieroglyphs. An interesting website dedicated to and to some extent influenced by his work can be found here. Following the work of 19th century Egyptologists like Jean-François Champollion, we know today that Egyptian hieroglyphs alternately represent either phonetic elements or words, depending on how they are used. In the case of cartouches, the series of symbols often found on monuments and usually placed in an oval in order to set them apart, hieroglyphs were exclusively a phonetic alphabet used to spell out the personal names of Egyptian dignitaries. For Kircher, however, they represented a language of images which, if not actually magical, were at least possessed of superabundant and secret meaning. Kircher sought transcendence in his efforts to cull meaning from cartouches. How far he fell short can be gathered from this gloss by Umberto Eco in The Search For The Perfect Language:

>Out of this passion for the occult came those attempts at decipherment which now amuse Egyptologists. On page 557 of his Obeliscus Pamphylius, figures 20-4 reproduce the images of a cartouche to which Kircher gives the following reading: ‘the originator of all fecundity and vegetation is Osiris whose generative power bears from heaven to his kingdom the Sacred Mophtha.’ This same image was deciphered by Champollion (Lettre a Dacier, 29), who used Kircher’s own reporductions, as ‘AOTKRTA (Autocrat or Emperor) sun of the son and sovereign of the crown, Caesar Domitian Augustus)’. The difference is, to say the least, notable, especially as regards the mysterious Mophtha, figured as a lion, over which Kircher expended pages and pages of mystic exegesis listing its numerous properties, while for Champollion the lion simply stands for the Greek letter lambda.

Whereas Kircher’s search for transcendence requires great learning, the icon to the right, of the Ladder of Divine Ascent, is accessible to the unliterate. Most icons in the Eastern Orthodox tradition are of saints, and are used in prayer to the saints. Icons that depict stories, such as this icon, are somewhat rare, though there is evidence that this was in fact the prior tradition, and the earliest Christian images, found in the catacombs of Rome, typically depict stories from the Bible. The icon of the Ladder of Divine Ascent is based on the ladder described by John Climacus in the 7th century book of the same name, and the Orthodox saint can in fact be found at the lower left corner of the image. Climacus, in turn, borrowed his ladder from the image of a ladder that Jacob dreamed about, a ladder extending from earth to heaven. In this icon, Christ stands at the top of the ladder, welcoming anyone who can make the full ascent. At the bottom are monks lining up to attempt the climb, while in between we see ascetics being diverted, distracted, and pulled off of the ladder by demonic beings. The message is fairly straight forward. Transcendence and salvation are possible, but very difficult. The ladder represents the journey, but also the mediation required to ascend from the cthonic to the celestial.

I find the metaphor of the ladder striking because 1) it is man-made, and 2) it is something that one steps off of when one reaches the top. These two features explain why the talent tree depicted above could never be a talent ladder, even though both are things that one climbs. The tree is something made of the same earthly material that it grows out of. It reaches for the sky, but because it is not truly a mediator, it cannot allow one to step off of it, and in fact the higher one climbs, the less stable one’s purchase is. Just as a pier is a disappointed bridge, as James Joyce indicated, a tree is a disappointed ladder. It goes nowhere.

This, I take it, is the humor inherent in the talent tree above. The talent tree provides a semblance of movement upwards, but ultimately disappoints. It always provides more, but the more turns out to be more of the same. For an interesting unpacking of this phenomenon, one could do worse than read this cautionary blog about the dangers of playing World of Warcraft:

>60 levels, 30+ epics, a few really good “real life” friends, a seat on the oldest and largest guild on our server’s council, 70+ days “/played,” and one “real” year later…
…
It took a huge personal toll on me. To illustrate the impact it had, let’s look at me one year later. When I started playing, I was working towards getting into the best shape of my life (and making good progress, too). Now a year later, I’m about 30 pounds heavier that I was back then, and it is not muscle. I had a lot of hobbies including DJing (which I was pretty accomplished at) and music as well as writing and martial arts. I haven’t touched a record or my guitar for over a year and I think if I tried any Kung Fu my gut would throw my back out. Finally, and most significantly, I had a very satisfying social life before.
…
These changes are miniscule, however, compared to what has happened in quite a few other people’s lives. Some background… Blizzard created a game that you simply can not win. Not only that, the only way to “get better” is to play more and more. In order to progress, you have to farm your little heart out in one way or another: either weeks at a time PvPing to make your rank or weeks at a time getting materials for and “conquering” raid instances, or dungeons where you get “epic loot” (pixilated things that increase your abilities, therefore making you “better”). And what do you do after these mighty dungeons fall before you and your friend’s wrath? Go back the next week (not sooner, Blizzard made sure you can only raid the best instances once a week) and do it again (imagine if Alexander the Great had to push across the Middle East every damn week).

The burden of Sisyphus is a perennial staple of humorists, and not a tragedy at all. Consider the most famous Laurel and Hardy short, The Music Box, in which the conceit of the whole film is the two bunglers trying to move a piano to a house on top of a hill. Perhaps the most iconic example of this sort of humor is Nigel’s amplifier from This Is Spinal Tap, which “goes to eleven”. For Nigel, eleven is a transcendent level of amplification, while for the mock interviewer, it is just one more number. Why not just re-calibrate the amplifier and make ten eleven? Nigel believes that eleven transforms the amplifier into a ladder, whereas the audience recognizes that it is just a tree.

I am at a point in my life where I see trees and ladders everywhere. For instance, the constant philosophical debates around the mind-body problem can be broken down into a simple question about whether consciousness is a tree or a ladder. If consciousness is the complex accumulation of basically simple brain processes, then it is a tree. If aggregating various physical processes never can achieve true consciousness, then consciousness is a ladder. And then from these two basic theses, we can arrive at all the other combinations of mind-body solutions, for instance that it is a tree that thinks it is a ladder, or a ladder that thinks it is a tree, or that ladder and tree are simply two equivalent modes of describing the same phenomenon, depending possibly on whether one is in fact a tree or a ladder.

Science fiction plots, in turn, can be broken down into two types: those in which ladders pretend to be trees, and those in which trees pretend to be ladders. Virtual worlds, finally, are the culmination of a historical weariness over these problems, and a consequent ambivalence about whether trees and ladders make any difference, anymore. For those who have chosen to forgo the search for ladders, virtual worlds provide a world of trees, which simulate the experience of climbing ladders — virtual ladders, so to speak.

Having had several years of success, Blizzard, the makers of World of Warcraft, have recently released a new expansion to their online world called The Burning Crusade. Whereas up to this point, players have been limited to a maximum level of 60, those who buy The Burning Crusade will have that ceiling lifted. With The Burning Crusade, World of Warcraft goes to level 70.