Tag Archives: Kinect

Kinect developer preview for Windows 10

 

unicorn

With Kind permission from the Product Group, I am reprinting this from an email that went out the Kinect developers mailing list.

Please also check out Mike Taulty’s blog for a walkthrough of what’s provided. The short version, though, is now you can access the perception API’s to work with Kinect v2 in a UWP app as well as use your Kinect v2 for face recognition as a replacement for your password. Please bear in mind that this is the public preview rather than a final release.

Snip >>

We are happy to announce our public preview of Kinect support for Windows 10.

 

This preview adds support for using Kinect with the built-in Windows.Devices.Perception APIs, and it also provides beta support for using Kinect with Windows Hello.

 

Getting started is easy. First, make sure you already have a working Kinect for Windows V2 attached to your Windows 10 PC.  The Kinect Configuration Verifier can make sure everything is functioning okay. Also, make sure your Kinect has a good view of your face –  we recommend centering it as close to the top or bottom of your monitor as possible, and at least 0.5 meters from your face.

 

Then follow these steps to install the Kinect developer preview for Windows 10:

 

1. The first step is to opt-in to driver flighting. You can follow the instructions here to set up your registry by hand, or you can use the following text to create a .reg file to right-click and import the settings:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\DriverFlighting\Partner]

“TargetRing”=”Drivers”

 

2. Next, you can use Device Manager to update to the preview version of the Kinect driver and runtime:

 

                            1. Open Device Manager (Windows key + x, then m).

                            2. Expand “Kinect sensor devices”.

                            3. Right-click on “WDF KinectSensor Interface 0”.

                            4. Click “Update Driver Software…”

                            5. Click “Search automatically for updated driver software”.

                            6. Allow it to download and install the new driver.

                            7. Reboot.

 

Once you have the preview version of the Kinect for Windows V2 driver (version 2.1.1511.11000 or higher), you can start developing sensor apps for Windows 10 using Visual Studio 2015 with Windows developer tools. You can also set up Windows Hello to log in using your Kinect.

 

1. Go to “Settings->Accounts->Sign-in options”.

2. Add a PIN if you haven’t already.

3. In the Windows Hello section, click “Set up” and follow the instructions to enable Windows Hello!

 

That’s it! You can send us your feedback at k4w@microsoft.com.

The Problem with Comparing Depth Camera Resolutions

We all want to have an easy way to compare different depth cameras to one another. Where we often stumble in comparing depth cameras, however, is in making the mistake of thinking of them in the same way we think of color cameras or color displays.

When we go to buy a color television or computer monitor, for instance, we look to the pixel density in order to determine the best value. A display that supports 1920 by 1080 has roughly 2.5 times the pixel density of a 1280 by 720 display. The first is considered high definition resolution while the second is commonly thought of as standard definition. From this, we have a rule of thumb that HD is 2.5 times denser than SD. With digital cameras, we similarly look to pixel density in order to compare value. A 4 megapixel camera is roughly twice as good as a 2 megapixel camera, while an 8 MP camera is four times as good. There are always other factors involved, but for quick evaluations the pixel density trick seems to work. My phone happens to have a 41 MP camera and I don’t know what to do with all those extra megapixels – all I know is that it is over 20 times as good as that 2 megapixel camera I used to have and that makes me happy.

When Microsoft’s Kinect 2 sensor came out, it was tempting to compare it against the Kinect v1 in a similar way: by using pixel density. The Kinect v1 depth camera had a resolution of 320 by 240 depth pixels. The Kinect 2 depth camera, on the other hand, had an increased resolution of 512 b 424 depth pixels. Comparing the total depth pixels provided by the Kinect v1 to the total provided by the Kinect 2: 76,800 vs 2, 217,088, many people arrived at the conclusion that the Kinect 2’s depth cameras was roughly three times better than the Kinect v1’s.

Another feature of the Kinect 2 is a greater field of view for the depth camera. Where the Kinect v1 has a field of view of 57 degrees by 43 degrees, the Kinect 2 has a 70 by 60 degree field of view. The new Intel RealSense 3D F200 camera, in turn, advertises an improved depth resolution of 480 by 360 degrees with an increased field of view of roughly 90 degrees by 72 degrees.

What often gets lost in these feature comparisons is that our two different depth camera attributes, resolution and field of view, can actually affect each other. Increased pixel resolution is only really meaningful if the field of view stays the same between different cameras. If we increase the field of view, however, we are in effect diluting the resolution of each pixel by trying to stuff more of the real world into the pixels we already have.

It turns out that 3D math works slightly differently from regular 2D math. To understand this better, imagine a sheet of cardboard held a meter out in front of each of our two Kinect sensors. How much of each sheet is actually caught by the Kinect v1 and the Kinect 2?

measurement

To derive the area of the inner rectangle captured by the Kinect v1 in the diagram above, we will use a bit of trigonometry. The field of view of the Kinect v1 is 58.5 degrees horizontal by 46.6 vertical. To get good angles to work with, however, we will need to bisect these angles. For instance, half of 46.6 is 23.3. The tangent of 21.5 degrees times the 1 meter hypotenuse (since the cardboard sheet is 1 M away) gives us an opposite side of .39 meters. Since this is only half of that rectangle’s side (because we bisected the angle) we multiply by two to get the full vertical side which is .78 meters. Using the same technique for the horizontal field of view, we capture a horizontal side of 1.09 meters.

Using the same method for the sheet of cardboard in front of the Kinect 2, we discover that the Kinect 2 captures a rectangular surface that is 1.4 meters by 1.14 meters. If we now calculate the area on the cardboard sheets in front of each camera and divide by each camera’s resolution, we discover that far from being three times better than the Kinect v1, each pixel caught by the Kinect 2 depth camera holds 1.5 times as much of the real world as each pixel of the Kinect v1. It is still a better camera, but not what one would think by comparing resolutions alone.

This was actually a lot of math in order to make a simple and mundane point: it all depends. Depth pixel resolutions do not tell us everything we need to know when comparing different depth cameras. I invite the reader to compare the true density of the RealSense 3D camera to the Kinect 2 or Xtion Pro Live camera if she would like.

On the other hand, it might be worth considering the range of these different cameras. The RealSense F200 cuts off at about a meter whereas the Kinect cameras only start performing really well at about that distance. Another factor is, of course, the accuracy of the depth information each camera provides. A third factor is whether one can improve the performance of a camera by throwing on more hardware. Because the Kinect 2 is GPU bound, it will actually work better if you simply add a better graphics card.

For me, personally, the most important question will always be how good the SDK is and how strong the community around the device is. With good language and community support, even a low quality depth camera can be made to do amazing things. An extremely high resolution depth camera with a weak SDK, alternatively, might in turn make a better paperweight than a feature forward technology solution.

[I’d like to express my gratitude to Kinect for Windows MVPs Matteo Valoriani and Vincent Guigui for introducing me to this geometric bagatelle.]

Come hear me speak about Mixed Reality at Dragon Con 2015

 dragoncon_logo

I’ve been invited by the Robotics and Maker Track to speak about near future technologies at Dragon Con this year. While the title of the talk is “Microsoft Kinect and HoloLens,” I’ll actually be talking more broadly about 3D sensors like Kinect and the Orbbec Astra, Virtual Reality with the Oculus Rift and HTC Vive as well as Augmented Reality with HoloLens and Magic Leap. I will cover how these technologies will shape our lives and potentially change our world over the next five years.

I am honored to have been asked to be a panelist at Dragon Con on technology I am passionate about and that has been a large part of my life and work over the past several years.

I should add that being a panelist at Dragon Con is a nerd and fan’s freakin’ dream come true for me. Insanely so. Hopefully I’ll be able to stay cool enough to get through all the material I have on our collective sci fi future.

oculus

I will cover each technology and the devices coming out in the areas of 3D sensors, virtual reality and augmented reality. I’ll discuss their potential impact as well as some of their history. I’ll delve into some of the underlying technical and commercial challenges that face each. I’ll bring lots of Kinect and Oculus demos (not allowed to show HoloLens for now, unfortunately) and will also provide practical advice on how to experience these technologies as a consumer as well as a developer in 2016.

kinect

My panel is on Sunday, Sept 6 at 2:30 in Savannah rooms 1, 2 and 3 in the Sheraton. Please come say hi!

 holo

The HoloCoder’s Bookshelf

WP_20150430_06_43_49_Pro

Professions are held together by touchstones such as as a common jargon that both excludes outsiders and reinforces the sense of inclusion among insiders based on mastery of the jargon. On this level, software development has managed to surpass more traditional practices such as medicine, law or business in its ability to generate new vocabulary and maintain a sense that those who lack competence in using the jargon simply lack competence. Perhaps it is part and parcel with new fields such as software development that even practitioners of the common jargon do not always understand each other or agree on what the terms of their profession mean. Stack Overflow, in many cases, serves merely as a giant professional dictionary in progress as developers argue over what they mean by de-coupling, separation of concerns, pragmatism, architecture, elegance, and code smell.

Cultures, unlike professions, are held together not only by jargon but also by shared ideas and philosophies that delineate what is important to the tribe and what is not. Between a profession and a culture, the members of a professional culture, in turn, share a common imaginative world that allows them to discuss shared concepts in the same way that other people might discuss their favorite TV shows.

This post is an experiment to see what the shared library of augmented reality and virtual reality developers might one day look like. Digital reality development is a profession that currently does not really exist but which is already being predicted to be a multi-billion dollar industry by 2020.

HoloCoding, in other words, is a profession that exists only virtually for now. As a profession, it will envelop concerns much greater than those considered by today’s software developers. Whereas contemporary software development is mostly about collecting data, reporting on data and moving data from point A to points B and C, spatial software development will be more concerned with environments and will have to draw on complex mathematics as well as design and experiential psychology. The bookshelf of a holocoder will look remarkably different from that of a modern data coder. Here are a few ideas regarding what I would expect to find on a future developer’s bookshelf in five to ten years.

 

1. Understanding Media by Marshall McLuhan – written in the 60’s and responsible for concepts such as ‘the global village’ and hot versus cool media, McLuhan pioneered the field of media theory.  Because AR and VR are essentially new media, this book is required reading for understanding how these technologies stand side-by-side with or perhaps will supplant older media.

2. Illuminations by Walter Benjamin – while the whole work is great, the essay ‘The Work of Art in the Age of Mechanical Reproduction’ is a must read for discussing how traditional notions about creativity fit into the modern world of print and now digital reproduction (which Benjamin did not even know about). It also deals at an advanced level with how human interactions work on stage versus film and the strange effect this creates.

3. Sketching User Experiences by Bill Buxton – this classic was quickly adopted by web designers when it came out. What is sometimes forgotten is that the book largely covers the design of products and not websites or print media – products like those that can be built with HoloLens, Magic Leap and Oculus Rift. Full of insights, Buxton helps his readers to see the importance of lived experience when we design and build technology.

4. Bergsonism by Gilles Deleuze – though Deleuze is probably most famous for his collaborations with Felix Guattari, this work on the philosophical meaning of the term ‘’virtual reality’, not as a technology but rather as a way of approaching the world, is a gem.

5. Passwords by Jean Baudrillard – what Deleuze does for virtual reality, Baudrillard does for other artifacts of technological language in order to show their place in our mental cosmology. He also discusses virtual reality along the way, though not as thoroughly.

6. Mathematics for 3D Game Programming and Computer Graphics by Eric Lengeyl – this is hardcore math. You will need this. You can buy it used online for about $6. Go do that now.

7. Linear Algebra and Matrix Theory by Robert Stoll – this is a really hard book. Read the Lengeyl before trying this. This book will hurt you, by the way. After struggling with a page of this book, some people end up buying the Manga Guide to Matrix Theory thinking that there is a fun way to learn matrix math. Unfortunately, there isn’t and they always come back to this one.

8. Phenomenology of Perception by Maurice Merleau-Ponty – when it first came out, this work was often seen as an imitation of Heiddeger’s Being and Time. It may be the case that it can only be truly appreciated today when it has become much clearer, thanks to years of psychological research, that the mind reconstructs not only the visual world for us but even the physical world and our perception of 3D spaces. Merleau-Ponty pointed this out decades ago and moreover provides a phenomenology of our physical relationship to the world around us that will become vitally important to anyone trying to understand what happens when more and more of our external world becomes digitized through virtual and augmented reality technologies.

9. Philosophers Explore the Matrix – just as The Matrix is essential viewing for anyone in this field, this collection of essays is essential reading. This is the best treatment available of a pop theme being explored by real philosophers – actually most of the top American philosophers working on theories of consciousness in the 90s. Did you ever think to yourself that The Matrix raised important questions about reality, identity and consciousness? These professional philosophers agree with you.

10. Snow Crash by Neal Stephenson – sometimes to understand a technology, we must extrapolate and imagine how that technology would affect society if it were culturally pervasive and physically ubiquitous. Fortunately Neal Stephenson did that for virtual reality in this amazing book that combines cultural history, computer theory and a fast paced adventure.

One Kinect to rule them all: Kinect 2 for XBox One

two_kinects

Yes. That’s a bit of a confusing title, but it seems best to lay out the complexity upfront. So far there have been two generations of the Kinect sensor which combine a color camera, a depth sensing camera, an infrared emitter (basically used for the depth sensing camera) and a microphone array which works as a virtual directional shotgun microphone. Additional software called the Kinect SDK then allows you to write programs that read these data feeds as well as interpolating them into 3D animated bodies that are representations of people’s movements.

Microsoft has just announced that they will stop producing separate versions of the Kinect v2, one for windows and one for the XBox One,  but will instead encourage developers to purchase the Kinect for Windows Adapter instead to plug their Kinects for XBox One into a PC. In fact, the adapter has been available since last year, but this just makes it official. All in all this is a good thing. With the promise that Universal Windows Apps will be portable to XBox, it makes much more sense if the sensors – and more importantly the firmware installed on them – are exactly the same whether you are on a PC running Windows 8/10 or an XBox running XBox OS.

This announcement also vastly simplifies the overall Kinect hardware story. Up to this point, there weren’t just two generations of Kinect hardware but also two versions of the current Kinect v2 hardware, one for the Xbox and one for Windows (for a total of four different devices). The Kinect hardware, both in 2010 and in 2013, has always been built first as a gaming device. In each case, it was then adapted to be used on Windows machines, in 2012 and 2014 respectively.

The now discontinued Kinect for Windows v2 differed from the Kinect for the Xbox One in both hardware and software. To work with Windows machines, the Kinect for Windows v2 device uses the specialized power adapter to pump additional power to the hardware (there is a splitter in the adapter that attaches the hardware to both a USB port as well as a wall plug). The Xbox One, being proprietary hardware, is able to pump enough juice to its Kinect sensor without needing special adapter. Additionally, the firmware for the original Kinect for Windows v1 sensor diverged over time from the Kinect for Xbox’s firmware – which led to differences in how the two versions of the hardware performed. It is now clear that this will not happen with Kinect v2.

Besides the four hardware devices and their respective firmware, the loose term “Kinect” can also refer to the software APIs used to incorporate Kinect functionality into a software program. Prior to this, there was a Kinect for Windows SDK 1.0 through 1.8 that was used to program against the original Kinect for Windows sensor. For the Kinect for XBox One with the Kinect for Windows Adapter, you will want to use the Kinect for Windows SDK 2.0 (“for Windows” is still part of the title for now, even though you will be using it with a Kinect for XBox One, though of course you can still use it with the Kinect for Windows v2 sensor if you happen to have bought one of those prior to their discontinuation). There are also other SDKs floating around such as OpenNI and Libfreenect.

[Much gratitude to Kinect MVP Bronwen Zande for helping me get the details correct.]


Top 21 HoloLens Ideas

holo

The image above is a best guess at the underlying technology being used in Microsoft’s new HoloLens headset. It’s not even that great a guess since the technology appears to still be in the prototype stage. On the other hand, the product is tied to the Windows 10 release date, so we may be seeing a consumer version – or at the very least a dev version – sometime in the fall.

Here are some things we can surmise about HoloLens:

a) the name may change – HoloLens is a good product name but isn’t quite where we might like it to be, in a league with Kinect, Silverlight or Surface for branding genius. In fact, Surface was such a good name, it was taken from one product group and simply given to another in a strange twist on the build vs buy vs borrow quandary. On the other hand, HoloLens sounds more official than the internal code name, Baraboo — isn’t that a party hippies throw themselves in the desert?

johnny mnemonic

b) this is augmented reality rather than virtual reality. Facebook’s Oculus Rift, which is an immersive fully digital experience, is an example of virtual reality. Other fictional examples include The Oasis from Ernest Cline’s Ready Player One, The Mataverse from Neal Stephenson’s Snow Crash, William Gibson’s Cyberspace and the VR simulation from The Lawnmower Man. Augmented reality involves overlaying digital experience on top of the real world. This can be accomplished using holography, transparent displays, or projectors. A great example of projector based AR is the RoomAlive project by Hrvoje Benko, Eyal Ofek and Andy Wilson at Microsoft Research. HoloLens uses glasses or a head-rig – depending on how generous you feel – to implement AR. Magic Leap – with heavy investment from Google – appears to be doing the same thing. The now dormant Google Glass was neither AR nor VR, but was instead a heads-up display.

kgirl

c) HoloLens uses Kinect technology under the plastic covers. While the depth sensor in the Kinect v2 has a field of view of 70 degrees by about 60 degrees, the depth capability in HoloLens is reported to include a field of view of 120 degrees by 120 degrees. This indicates that HoloLens will be using the Time-of-Flight technology used in Kinect v2 rather than the structured light from Kinect v1. This set up requires both an IR emitter as well as a depth camera combined with a sophisticated timing and phase technology to efficiently and relatively inexpensively calculate depth.

hands

d) the depth camera is being used for multiple purposes. The first is for gesture detection. One of the issues that faced both Oculus and Google Glass was that they were primarily display technologies. But a computer monitor is useless without a keyboard or mouse. Similarly, Oculus and Glass needed decent interaction metaphors. Glass relied primarily on speech commands and tapping and clicking. Oculus had nothing until their recent acquisition of the NimbleVR . NimbleVR provides a depth camera optimized for hand and finger detection over a small range. This can be mounted in front of the Oculus headset. Conceptually, this allows people to use hand gestures and finger manipulations in front of the device. A virtual hand can be created as an affordance in the virtual world of the Oculus display, allowing users to interact with virtual objects and virtual interactive menus in virtro.

The depth sensor in HoloLens would work in a similar way except that instead of a virtual hand as affordance, it’s just your hand. You will use your hand to manipulate digital objects displayed on the AR lenses or to interact with AR menus using gestures.

An interesting question is how many IR sensors are going to be on the HoloLens device. From the pictures that have been released, it looks like we will have a color camera and a depth sensor for each eye, for a total of two depth cameras and two RGB cameras located near the joint between the lenses and the headband.

holo_minecraft

e) HoloLens is also using depth data for 3d reconstruction of real world surfaces. These surfaces are then used as virtual projection surfaces for digital textures. Finally, the blitted image is displayed on the transparent lenses.

ra1

ra_2

This sort of reconstruction is a common problem in projection mapping scenarios. A great example of applying this sort of reconstruction can be found in the Halloween edition of Microsoft Research’s RoomAlive project. In the first image above, you are seeing the experience from the correct perspective. In the second image, the image is captured from a different perspective than the one that is being projected. From the incorrect perspective, it can be seen that the image is actually being projected on multiple surfaces – the various planes of the chair as well as the wall behind it – but foreshortened digitally and even color corrected to make the image appear cohesive to a viewer sitting at the correct position. One or more Kinects must be used to calibrate the projections appropriately against these multiple surfaces. If you watch the full video, you’ll see that Kinect sensors are used to track the viewer as she moves through the room and the foreshortening / skewing occurs dynamically to adjust to her changing position.

The Minecraft AR experience being used to show the capabilities of HoloLens requires similar techniques. The depth sensor is required not only to calibrate and synchronize the digital display to line up correctly with the table and other planes in the room, but also to constantly adjust the display as the player moves around the room.

eye-tracking

f) are the display lenses stereoscopic or holographic? At this point no one is completely sure, though indications are that this is something more than the stereoscopic display technique used in the Oculus Rift. While a stereoscopic display will create the illusion of depth and parallax by creating a different image for each lens, something holographic would actually be creating multiple images per lens and smoothly shifting through them based on the location of each pupil staring through its respective lens and the orientation and position of the player’s head.

One way of achieving this sort of holographic display is to have multiple layers of lenses pressed against each other and using interference shift the light projected into each pupil as the pupil moves. It turns out that the average person’s pupils typically move around rapidly in saccades, mapping and reconstructing images for the brain, even though we do not realize this motion is occurring. Accurately capturing these motions and shifting digital projections appropriately to compensate would create a highly realistic experience typically missing from stereoscopic reconstructions. It is rumored in the industry that Magic Leap is pursuing this type of digital holography.

On the other hand, it has also been reported that HoloLens is equipped with eye-tracking cameras on the inside of the frames, apparently to aid with gestural interactions. It would be extremely interesting if Microsoft’s route to achieving true holographic displays involved eye-tracking combined with a high display refresh rate rather than coherent light interference display technology as many people assume. Or, then again, it could just be stereoscopic displays after all.

occlusion

g) occlusion is generally considered a problem for interactive experiences. For augmented reality experiences, however, it is a feature. Consider a physical-to-digital interaction in which you use your finger/hand to manipulate a holographic menu. The illusion we want to see is of the hand coming between the player’s eyes and the digital menu. The player’s hand should block and obscure portions of the menu as he interacts with it.

The difficulty with creating this illusion is that the player’s hand isn’t really between the menu and the eyes. Really, the player’s hand is on the far side of the menu, and the menu is being displayed on the HoloLens between the player’s eyes and his hand. Visually, the hologram of the menu will bleed through and appear on top of the hand.

In order to re-establish the illusion of the menu being located on the far side of the hand, we need depth-sensors to accurately map an outline of the hand and arm and then cut a hand and arm shape out of the menu where the hand should be occluding it. This process has to be repeated as the hand moves in real-time and it’s kind of a hard problem.

borg

h) misc sensors : best guess is that in addition to depth sensors, color cameras and eye-tracking cameras, we’ll also get a directional microphone, gyroscope, accelerometer and magnetometer. Some sort of 3D sound has been announced, so it makes sense that there is a directional microphone or microphone array to complement it. This is something that is available on both the Kinect v1 and Kinect v2. The gyroscope, accelerometer and magnetometer are also guesses – but the Oculus hardware has them to track quick head movements, head position and head orientation. It makes sense that HoloLens will need them also.

 bono

i) the current form factor looks a little big – bigger than the Magic Leap is supposed to be but smaller than the current Oculus dev units. The goal – really everyone’s goal, from Microsoft to Facebook to Google – is to continue to bring down the size of sensors so we can eventually have heavy glasses rather than light-weight head gear.

j) vampires, infrared sensors and transparent displays are all sensitive to direct sunlight. This consideration can affect the viability of some AR scenarios.

k) like all innovative technologies, the success of HoloLens will depend primarily on what people use it for. the myth of the killer app is probably not very useful anymore, but the notion that you need an app store to sell a device is a generally accepted universal constant. The success of the HoloLens will depend on what developers build for it and what consumers can imagine doing with it.

 

 

Top 21 Ideas

Many of these ideas are borrowed from other VR and AR technology. In most cases, HoloLens will simply provide a better way to implement these notions. These ideas come from movies, from art installations, and from many years working at an innovative marketing agency where we prototyped these ideas day in and day out.

 

1. Shopping

shopping

Amazon made one click shopping make sense. Shopping and the psychology of shopping changes when we make it more convenient, effectively turning instant gratification into a marketing strategy. Using HoloLens AR, we can remodel a room with virtual furniture and then purchase all the pieces on an interactive menu floating in the air in front of us when we find the configuration we want. We can try and buy virtual clothes. With a wave of the hand we can stock our pantry, stock our refrigerator … wait, come to think of it, with decent AR, do we even need furniture or clothes anymore?

2. Gaming

 illumiroom

IllumiRoom was a Microsoft project that never quite made it to product but was a huge hit on the web. The notion was to extend the XBox One console with projections that reacted to what was occurring in the game but could also extend the visuals of the game into the entire living room. IllumiRoom (which I was fortunate enough to see live the last time I was in Redmond) also uses a Kinect sensor to scan the room in order to calibrate projection mapping onto surfaces like bookshelves, tables and potted plants. As you can guess, this is the same team that came up with RoomAlive. A setup that includes a $1,500 projector and a Kinect is a bit complicated, especially when a similar effect can now be created using a single unit HoloLens.

hud

The HoloLens device could also be used for in-game Heads-Up notifications or even as a second screen. It would make a lot of sense if XBox integration is on the roadmap and would set XBox apart as the clear leader in the console wars.

3. Communication

sw

‘nuff said.

4. Home Automation

clapper

Home automation has come a long way and you can now easily turn lights on and off with your smart phone from miles away. Turning your lights on and off from inside your own house may still involve actually touching a light switch. Devices like the Kinect have the limitation that they can only sense a portion of a room at a time. Many ideas have been thrown out to create better gesture recognition sensors for the home, including using wifi signals that go through walls to detect gestures in other rooms. If you were actually wearing a gestural device around with you, this would no longer be a problem. Point at a bulb, make a fist, “put out the light, and then put out the light” to quote the Bard.

5. Education

Microsoft-future-vision

While cool visuals will make education more interesting, the biggest benefit of HoloLens for education is simple access. Children in rural areas in the US have to travel long distances to achieve a decent education. Around the world, the problem of rural education is even worse. What if educators could be brought to the children instead? This is one of the stated goals of Facebook’s purchase of Oculus Rift and HoloLens can do the same job just as well and probably better.

6. Medical Care

bodyscan

Technology can be used for interesting diagnostic and rehabilitation functions. The depth sensors that come with HoloLens will no doubt be used in these ways eventually. But like education, one of the great problems in medical care right now is access. If we can’t bring the patient to the doctor, let’s bring the GP to the patient and do regular check ups.

7. Holodeck

matrix-i-know-kung-fu

The RoomAlive project points the way toward building a Holodeck. All we have to do is replace Kinect sensors with HoloLens sensors, projectors with holographic displays, and then try now to break the HoloLens strapped to our heads as we learn Kung Fu.

8. Windows

window

Have you ever wished you could look out your window and be somewhere else? HoloLens can make that happen. You’ll have to block out natural light by replacing your windows with sheetrock, but after that HoloLens can give you any view you want.

fifteen-million-merits

But why stop at windows. You can digitize all your walls if you want, and HoloLens’ depth technology will take care of the rest.

9. Movies and Television

vr-cinema-3d

Oculus Rift and Samsung Gear VR have apps that let you watch movies in your own virtual theater. But wouldn’t it be more fun to watch a movie with your entire family? With HoloLens we can all be together on the couch but watch different things. They can watch Barney on the flatscreen while I watch an overlay of Meet the Press superimposed on the screen. Then again, with HoloLens maybe I could replace my expensive 60” plasma TV with a piece of cardboard and just watch that instead.

10. Therapy

whitenoise

It’s commonly accepted that white noise and muted colors relax us. Controlling our environment helps us to regulate our inner states. Behavioral psychology is based on such ideas and the father of behavioral psychology, B. F. Skinner, even created the Skinner box to research these ideas – though I personally prefer Wilhelm Reich’s Orgone box. With 3D audio and lenses that extend over most of your field of view, HoloLens can recreate just the right experience to block out the world after a busy day and just relax. shhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.

11. Concerts

burning-man-festival-nevada

Once a year in the Nevada desert a magical music festival is held called Baraboo. Or, I don’t know, maybe it’s held in Tennessee. In any case, getting to festivals is really hard and usually involves being around people who aren’t wearing enough deodorant, large crowds, and buying plastic bottles of water for $20. Wouldn’t it be great to have an immersive festival experience without all the things that get in the way. Of course, there are those who believe that all that other stuff is essential to the experience. They can still go and be part of the background for me.

12. Avatars

avatar-body

Gamification is a huge buzzword at digital marketing agencies. Undergirding the hype is the realization that our digital and RL experiences overlap and that it sometimes can be hard to find the seams. Vernor Vinge’s 2001 novella Fast Times at Fairmont High draws out the implications of this merging of physical and digital realities and the potential for the constant self reinvention we are used to on the internet bleeding over into the real world. Why continue with the name your parents gave you when you can live your AR life as ByteMst3r9999? Why be constrained by your biological appearance when you can project your inner self through a fun and bespoke avatar representation? AR can ensure that other people only see you the way that you want them to.

13. Blocking Other People’s Avatars

BlackBlock

The flip side of an AR society invested in an avatar culture is the ability to block people who are griefing us. Parents can call a time out and block their children for ten minutes periods. Husbands can block their wives. We could all start blocking our co-workers on occasion. For serious offenses, people face permanent blocking as a legal sanction for bad behavior by the game masters of our augmented reality world. The concept was brilliantly played out in the recent Black Mirror Christmas special starring Jon Hamm. If you haven’t been keeping up with Black Mirror, go check it out. I’ll wait for you to finish.

14. Augmented Media

fiducial

Augmented reality today typically involves a smart phone or tablet and and a fiducial marker. The fiducial is a tag or bar code that indicates to the app on your phone where an AR experience should be placed. Typically you’ll find the fiducial in a magazine ad that encourages you to download an app to see the hidden augmented content. It’s novel and fun. The problem involves having to hold up your tablet or phone for a period of time just to see what is sometimes a disappointing experience. It would be much more interesting to have these augmented media experiences always available. HoloLens can be always on and searching for these types of augmented experiences as you read the latest New Yorker or Wired. They needn’t be confined to ads, either. Why can’t the whole magazine be filled with AR content? And why stop at magazines? Comic books with additional AR content would change the genre in fascinating ways (Marvel’s online version already offers something like this, though rudimentary). And then imagine opening a popup book where all the popups are augmented, a children’s book where all the illustrations are animated, or a textbook that changes on the fly and updates itself every year with the latest relevant information – a kindle on steroids. You can read about that possibility in Neal Stephenson’s Diamond Age – only available in non-augmented formats for now.

15. Terminator Vision

robocop

This is what we thought Google Glass was supposed to provide – but then it didn’t. That’s okay. With vision recognition software and the two RGB cameras on HoloLens, you’ll never forget a name again. Instant information will appear telling you about your surroundings. Maps and directions will appear when you gesture for them. Shopping associates will no longer have to wing it when engaging with customers. Instead, HoloLens will provide them with conversation cues and decision trees that will help the associate close the sale efficiently and effectively. Dates will be more interesting as you pull up the publicly available medical, education and legal histories of anyone who is with you at dinner. And of course, with the heartbeat monitor and ability to detect small fluctuations in skin tone, no one will ever be able to lie to you again, making salary negotiations and buying a car a snap.

16. Wealth Management

hololens11140

With instant tracking of the DOW, S&P and NASDAQ along with a gestural interface that goes wherever you go, you can become a day trader extraordinaire. Lose and gain thousands of dollars with a flick of your finger.

17. Clippit

clippit

Call him Jarvis if it helps. Some sort of AI personal assistant has always been in the cards. Immersive AR will make it a reality.

18. Impossible UIs

minority_report

phone

3dtouch

cloud atlas floating computer

I don’t watch movies the way other people do. Whenever I go to see a futuristic movie, I try to figure out how to recreate the fantasy user experiences portrayed in it. Minority Report is an easy one – it’s a large area display, possibly projection, with Kinect-like gestural sensors. The communication device from the Total Recall reboot is a transparent screen and either capacitive touch or more likely a color camera doing blob recognition. The 3D touchscreen from Pacific Rim has always had me stumped. Possibly some sort of leap motion device attached to a Pepper’s Ghost display? The one fantasy UX I could never figure out until I saw HoloLens is the “Orison” computer made up of floating disks in Cloud Atlas. The Orison screens are clearly digital devices in a physical space – beautiful, elegant, and the sort of intuitive UX for which we should strive. Until now, they would have been impossible to recreate. Now, I’m just waiting to get my hands on a dev device to try to make working Orison displays.

19. Wiki World

wikipedia

Wiki World is a simple extension of terminator vision. Facts floating before your eyes, always available, always on. No one will ever have to look up the correct spelling for a word again or strain his memory for a sports statistic. What movie was that actor in? Is grouper ethical to eat? Is Javascript an object-oriented language? Wiki world will make memorization obsolete and obviate all arguments – well, except for edit wars between Wikipedia editors, of course.

20. Belief Circles

wwc

Belief circles are a concept from Vernor Vinge’s Hugo award winning novel Rainbows End. Augmented reality lends itself to self-organizing communal affiliations that will create inter-subjective realities that are shared. Some people will share sci-fi themes. Others might go the MMO route and share a fantasy setting with a fictional history, origin story, guilds and factions. Others will prefer football. Some will share a common religion or political vision. All of these belief circles will overlap and interpenetrate. Taking advantage of these self-generating belief circles for content creation and marketing will open up new opportunities for freelance creatives and entrepreneurs over the next ten years.

21. Theater of Memory

camillo1.gif

Giulio Camillo’s memory theater belongs to a long tradition of mnemonic technology going back to Roman times and used by orators and lawyers to memorize long speeches. The scholar Frances Yates argued that it also belonged to another Renaissance tradition of neoplatonic magic that has since been superseded by science in the same way that memory technology has been superseded by books, magazines and computers. What Frances Yates – and after her Ioan Couliano – tried to show, however, was that in dismissing these obsolete modes of understanding the world, we also lose access to a deeper, metaphoric and humanistic way of seeing the world and are the poorer for it. The theater of memory is like Feng Shui – also discredited – in that it assumes that the way we construct our surroundings also affects our inner lives and that there is a sympathetic relationship between the macrocosm of our environment and the microcosm of our emotional lives. I’m sounding too new agey so I’ll just stop now. I will be creating my own digital theater of memory as soon as I can, though, as a personal project just for me.

Projecting Augmented Reality Worlds

WP_20141105_11_05_56_Raw

In my last post, I discussed the incredible work being done with augmented reality by Magic Leap. This week I want to talk about implementing augmented reality with projection rather than with glasses.

To be more accurate, varieties of AR experiences are often projection based. The technical differences depend on which surface is being projected on. Google glass projects on a surface centimeters from the eye. Magic Leap is reported to project directly on the retina (virtual retinal display technology).

AR experiences being developed at Microsoft Research, which I had the pleasure of visiting this past week during the MVP Summit, are projected onto pre-existing rooms without the need to rearrange the room itself. Using fairly common projection mapping techniques combined with very cool technology such as the Kinect and Kinect v2, the room is scanned and appropriate distortions are created to make projected objects look “correct” to the observer.

An important thing to bear in mind as you look through the AR examples below is that they are not built using esoteric research technology. These experiences are all built using consumer-grade projectors, Kinect sensors and Unity 3D. If you are focused and have a sufficiently strong desire to create magic, these experiences are within your reach.

The most recent work created by this group (led by Andy Wilson and Hrvoje Benko) is a special version of RoomAlive they created for Halloween called The Other Resident. Just to prove I was actually there, here are some pictures of the lab along with the Kinect MVPs amazed that we were being allowed to film everything given that most of the MVP Summit involves NDA content we are not allowed to repeat or comment on.

WP_20141105_004

WP_20141105_016

WP_20141105_013

 

IllumiRoom is a precursor to the more recent RoomAlive project. The basic concept is to extend the visual experience on the gaming display or television with extended content that responds dynamically to what is seen onscreen. If you think it looks cool in the video, please know that it is even cooler in person. And if you like it and want it in your living room, then comment on this thread or on the youtube video itself to let them know it is definitely an M viable product for the XBox One, as the big catz say.

The RoomAlive experience is the crown jewel at the moment, however. RoomAlive uses multiple projectors and Kinect sensors to scan a room and then use it as a projection surface for interactive, procedural games: in other words, augmented reality.

A fascinating aspect of the RoomAlive experience is how it handles appearance preserving point-of-view dependent visualizations: the way objects need to be distorted in order to appear correct to the observer. In the Halloween experience at the top, you’ll notice that the animation of the old crone looks like it is positioned in front of the chair she is sitting on even the the projection surface is actually partially extended in front of the chair back and at the same time extended several feet behind the chair back for the shoulders and head.  In the RoomAlive video just above you’ll see the view dependent visualization distortion occurring with the running soldier changing planes at about 2:32”.

 

You would think that these appearance preserving PDV techniques will fall apart anytime you have more than one person in the room. To address this problem, Hrvoje and Andy worked on another project that plays with perception and physical interactions to integrate two overlapping experiences in a Wizard Battle scenario called Mano-a-Mano or, more technically, Dyadic Projected Spatial Augmented Reality. The globe at visualization at 2:46” is particularly impressive.

My head is actually still spinning following these demos and I’m still in a bit of a fugue state. I’ve had the opportunity to see lots of cool 3D modeling, scanning, virtual experiences, and augmented reality experiences over the past several years and felt like I was on top of it, but what MSR is doing took me by surprise, especially when it was laid out sequentially as it was for us. A tenth of the work they have been doing over the past two years could easily be the seed of an idea for any number of tech startups.

In the middle of the demos, I leaned over to one of the other MVPs and whispered in his ear that I felt like Steve Jobs at Xerox PARC seeing the graphical user interface and mouse for the first time. He just stroked his beard and nodded. It was a magic moment.

MSR Mountain View and Kinect

Just before the start of the weekend, Mary Jo Foley broke the story that the Mountain View lab of Microsoft Research was being closed.  Ideally, most of the researchers will be redistributed to other locations and not be casualties of the most recent round of layoffs.

The Kinect sensor is one of the great examples of Microsoft Research successfully working well with a product team to bring something to market.  Researchers from around the world worked on Project Natal (the code-name for Kinect).  An extremely important contribution to the machine learning required to make skeleton tracking work on the Kinect was made in Mountain View.

Machine learning works best when you are dealing with lots of data.  In the case of skeleton tracking, millions of images had been gathered.  But how do you find the hardware to process that many images?

Fortunately, the Mountain View group specialized in distributed computing.  One researcher in particular, Mihai Budiu, worked on a project that he believed would help the Project Natal team to solve one of its biggest hurdles.  The project was called DryadLinq and could be used to coordinate parallel processing over a large server cluster.  The problem it solved was recognizing body parts for people of various sizes and shapes – a preliminary step to generating the skeleton view.

The research lab at Mountain View was an essential part of the Kinect story.  It will be missed.

Ghost Hunting with Kinect

Paranormal Activity 4

I don’t usually try to undersell the capabilities of the Kinect.  Being a Microsoft Kinect for Windows MVP, I actually tend to promote all the things that Kinect currently does and one day will do.  In fact, I have a pretty big vision of how Kinect, Kinect 2, Leap Motion, Intel’s Perceptual Computing camera and related gestural technologies will change the way we interact with our environment.

Having said that, let me just add that Kinect cannot find ghosts.  It might reveal bugs in the underlying Kinect software – but it cannot find ghosts.

Nevertheless, “experts” are apparently using Kinect sensors to reveal the presence of ghosts.  Here’s a clip from Travel Channel’s Ghost Adventures.  It’s an episode called Cripple Creek and you’ll want to skip ahead to about 3:50 (ht to friend Josh Blake for finding this).

The logic of this is based on some very sophisticated algorithms the Kinect uses to identify “skeletons” – or outlines of the human form.  The current Kinect can spot two skeletons at a time including up to 20 joints on each skeleton.  Additionally, it has a “seated mode” that allows it to identify partial skeletons from about the waist up – this tends to be a little more dodgy though.  All of this skeleton information is provided primarily to allow developers to create games that track the human body and, typically, animate an onscreen avatar that emulates the player’s movements.

The underlying theory behind using it for ghost hunting is that, since when someone passes in front of the Kinect sensor the Kinect will typically register a skeleton, it follows that if the Kinect registers a skeleton someone must have passed in front of it.

skeleton

Unfortunately, this is not really the case.  There are lots of forum posts from developers asking how to work around peculiarities with the Kinect skeletons while anyone who has played a Kinect game on XBox has probably noticed that the sensor will occasionally provide false positives (which for gaming, is ultimately better than false negatives).  In fact, even my dog would sometimes register as a skeleton when he ran in front of me while I was playing. 

Perhaps you’ve also noticed that in an oddly shaped room, Kinect is prone to register false speech commands.  This happens to me especially when I’m trying to watch my favorite ghost hunting show on Netflix – probably because of the feedback from the television itself (which the Kinect tends to be very good at cancelling out if you take the trouble to configure it according to instructions – but I don’t).  I know this isn’t a ghost pausing my TV show, though, because the Kinect isn’t set up to hear anything I don’t hear.  Just because the Kinect emulates some human features – like following simple voice commands like “Play” and “Pause” – doesn’t mean it’s something from The Terminator, The Matrix or Minority Report.  It is no more psychic than I am and it doesn’t have super hearing.

Kinect 2 IR

Similarly, skeleton tracking on Kinect isn’t specially fitted to see invisible things.  It uses a combination of an infrared camera and a color camera to collect data which it interprets as a human structure.  But these cameras don’t see anything the human eye can’t see with the lights on.  Those light photons that are being collected by the sensors still have to bounce off of something visible, even if you can’t see the light beams themselves.  Perhaps part of the illusion is that, because we can’t see the infrared light being emitted and collected by the Kinect, people assume that what it detects also can’t be seen?

Here’s another episode of Ghost Adventures on location at the haunted Talumne Hospital.  It’s especially remarkable because the Kinect here is doing exactly what it is expected to do.  As the subject lifts himself off the bed, he separates his outline from the background and Kinect for Windows’ “seated mode” identifies his partial skeleton from approximately the waist up.  The intrepid ghost hunters then scream out “It was in your gut!”  Television gold.

Apparently the use of unfamiliar (and misunderstood) technology provides a veneer of seriousness to what these people do on their shows.  Another piece of weird technology all these shows use is something called EVP – electronic voice phenomena.  Here the idea is that you put out a tape recorder or digital recorder and let it run for a while – often with a white noise machine in the background.  Then you play it back later and you start hearing things you didn’t hear at the time.  The trick is that if you run these recordings through software intended to clean up audio in order to discover voices, they remarkably discover voices that you never heard but which must be the voice of ghosts.

I can’t help feeling, however, that it isn’t the world of extrasensory phenomena that is mysterious and baffling to us.  It’s all the crazy new technologies that appear every day that is truly supernatural and overwhelming.  Perhaps tying all of these frightening technologies to our traditional myths and collective superstitions is just a way of making sense of it all and normalizing it.

Got an Image Enhancer that can Bitmap?

Every UI platform needs a killer concept.  For the keyboard and mouse it was the Excel sheet.  If you ever watch the rebooted Hawaii Five-0, you’ll realize that for Touch it’s the flick.  Flicking is more satisfying than tapping on sooo many levels.  Birds do it, bees do it, even monkeys in the trees do it.

Gestural interfaces haven’t found that killer concept yet, but it may just be the ability to zoom in on an image.  Like flicking and entering tabular data, killer concepts don’t necessarily have to be clever.  They just have to feel right.

Consider what John Anderton spent his time doing in 2002’s Minority Report.  For the most part, he used innovative fantasy technology (later made real at Oblong Industries) to enhance images on his rather large screen.

Go back even further and you’ll recall Rick Deckard used speech recognition to enhance an image in 1982’s Blade Runner.  This may be the first inkling any of us had of the true purpose of NUI.

It obviously left an impression on the zeitgeist because every movie or TV show attempting to demonstrate technological sophistication on the cheap (CSI being the biggest culprit) managed to insert an “enhance” scene into their franchise somewhere.

And if you happened to have a movie with no budget, there was no reason you should let this stop you.

And while we’re getting nonstalgic for NUI, let’s not forget to give credit where credit is due. Before Leap Motion, before Microsoft’s Kinect, before Oblong’s g-speak, even before Minority Report, there was the NES Power Glove:

And in the decades after, all we’ve managed to do is to enhance that killer concept.